GOOGLE FILE SYSTEM (GFS)
Introduction
Google File System is a copyrighted distributed file system developed by google itself and it was specially designed to provide better reliability access into data using large clusters of commodity servers. If we are given to compare traditional file system with GFS, it is designed to run on data centers that provide extreme high data throughputs and the ability to survive the individual system failures if it occurs. In this report, we will explain how Google implemented GFS to the readers and how it works in certain ways. Not only that, we will show the comparison of traditional file system with GFS, advantages and disadvantages of GFS and why it is so special to us.
Background
What is a Google
…show more content…
Imagine how does the Google’s world of database look like? Therefore, nothing is small because google provide everything a user need to find through the database. GFS is implemented to encounter the rapid growing of demands of Google’s data processing requirements. However, Google have difficulties when it comes to managing large amount of data. Depending on the average number of comparable small servers, GFS is mainly designed as a distributed file system that can be run on clusters for more than a thousands of machines. To ease the GFS application development, the file system includes a programming interface used to abstract the management and distribution aspect. While commodity hardware is being tested, GFS does not only being challenged by managing not only on the distribution but also needed to cope with the increased danger of hardware problems. Developers of GFS made an assumptions during the design of GFS is to consider handling the disk faults, machine faults and network faults as being the model rather than the exception. The key challenges faced by GFS is the security of data while scaling up to more than a thousands of computers while managing the multiple terabytes of data
1. Description of the service-summary The service that we are going to research and try to incorporate into the organization is cloud infrastructure as a service. We are planning to provide the end user with a well maintained network storage that would be easily accessible from any location while maintaining a secured connection and redundancy of the client data. With the changes in technology and advancements in cloud services, we should be able to save some money for the organization by going to cloud infrastructure services and limiting the maintenance and hardware cost of housing our own servers.We currently house 44+ servers in our service area, most of the servers are being used to less than 30% of capacity while others are reaching a peak 80-90% capacity. The servers house the client’s p: drive (personal data) as well as
Hadoop \cite{white2012hadoop} is an open-source framework for distributed storage and data-intensive processing, first developed by Yahoo!. It has two core projects: Hadoop Distributed File System (HDFS) and MapReduce programming model \cite{dean2008mapreduce}. HDFS is a distributed file system that splits and stores data on nodes throughout a cluster, with a number of replicas. It provides an extremely reliable, fault-tolerant, consistent, efficient and cost-effective way to store a large amount of data. The MapReduce model consists of two key functions: Mapper and Reducer. The Mapper processes input data splits in parallel through different map tasks and sends sorted, shuffled outputs to the Reducers that in turn groups and processes them using a reduce task for each group.
Client requests file from the cold cache of the server and stores file on its disk.
GFS: Google File System is a distributed file system which is developed by Google in order to provide efficient, reliable access to data. . It is designed and implemented inorder to meet the requirements provided by Google’s data processing. The file system consists of hundreds of storage machines to provide inexpensive parts and it is accessed by different client machines. Here the search engine is providing huge amounts data that should be stored. GFS has 1,000 nodes with 300TB disk storage.
HDFS is Hadoop’s distributed file system that provides high throughput access to data, high-availability and fault tolerance. Data are saved as large blocks making it suitable for applications
[8] J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, and M. Theimer.Reclaiming space from duplicate files in a serverless distributedfile system. In ICDCS, pages 617–624, 2002.
Hadoop1 provides a distributed filesystem and a framework for the analysis and transformation of very large data sets using the MapReduce [DG04] paradigm. While the interface to HDFS is patterned after the Unix filesystem, faithfulness to standards was sacrificed in favor of improved performance for the applications at hand.
A client/server architecture is an end-to-end systems that contains server hosts (contains the resources and services needed by the client) and clients (users or workstations in the network). “Most client/server networks have more than one client to a server so that the system shares computing power. There are a few different kinds of servers to include file sharing, printer services, email services, database services, web services, and a server can used for it power. Clients can access all of these different servers at one time and the servers can serve many clients.” (Techopedia, 2014) For large businesses with office throughout the country web based computing or cloud computing shows the greatest benefit. In web-based computing it is not the local computer doing all the work but rather computers off site that do the work. The user’s computer runs off of a cloud computing system’s interface software which uses the offsite network computers to do the work.
International Journal of Cloud Computing: Peer-reviewed open access journal, it publishes research crossing all aspects of Cloud Computing. Basically centered around center components, including Cloud applications, Cloud systems and the advances that will prompt the Clouds without bounds, the journal will likewise show review and survey papers that present new bits of knowledge and establish the frameworks for encouraging exploratory and experimental work. The journal disseminates research that imparts progressed hypothetical establishing and functional application of Clouds and related systems, as empowered by mixes of web-based programming, advancement stacks and database availability and virtualized equipment for storing, handling, analysis and visualizing data. A scope will look at Clouds nearby such different standards as Peer to Peer (P2P) figuring, Cluster processing and Grid registering. Scope reaches out to issues of administration, governance, trust and
Google Drive is a groundbreaking advancement for technological collaboration due to its simplistic qualities and lack of financial cost to users. In April 2012, Google released Google Drive, consisting of Google Docs, Google Slides, and Google Sheets, also known as “Google applications”. All three applications are innovative in their storage, integration, and sharing capabilities, which address users’ needs by allowing multiple people to contribute to a document, presentation, or spreadsheet simultaneously, streamlining the editing and group work process significantly. In fact, according to Devaney (2018), “collaboration is the name of the game with Google Drive” (para. 1). This technology falls under the umbrella term “cloud computing,” a
Cloud because of its wide range of applications it allows users to store data their data remotely in the cloud and enjoy the on-demand high quality cloud applications and reveal burden from the local storage, cost and maintenance. In this according to the user’s perspective, including both individuals (private) and enterprises like companies appealing the cloud benefits by storing data remotely into the cloud in a flexible on-demand manner and relief of the burden of storage management along with this he/she can also enjoy the universal data access which dependent geographical locations and avoidance of the capital expenditure, software, hardware and personnel management and maintenances and so on.
Cloud computing is a topic of which much is assumed. The average person recognizes the term “cloud computing” as having to do with their storage from their iPad or iPhone on the online storage area which syncs their Apple devices to their computer. This common cloud is called the iCloud. That is where common knowledge ends about this topic. However, upon further exploration, a deeper understanding is gained with greater explanation, and it is realized that cloud computing is something that is used all of the time on many levels of everyday technology. While the terminology remains cryptic to the mind of most people, the concepts behind the practical uses of cloud computing become quite clear. It is relatable and understandable. Upon this revelation, the iCloud is recognized as the tip of the preverbal iceberg when speaking about cloud computing. It is important to discuss and further understand the many types of cloud computing as well as the various applications to life through technology. This affects how information is stored online, computers are protected, information is secured, emails are processed, and many other factors that are taken for granted in the world of technology. Cloud computing is a general term used to describe how information is stored, utilized, and accessed over the internet. There is no cloud, but the word cloud gives the connotation of an abstract place which is known to exist but is too vast to touch or contain (Griffith,
Modern day computing systems rely on a distributed system for data, functions and services. Arguably all popular software such as Uber, Spotify, Facebook and Fitbit among others, host their data and applications on dedicated servers to allow for user access services through their devices. The challenge with server based systems is that the integrity and security of private data is left out to third parties, nowadays established as corporations who offer hosting services for applications database and file storage. The main advantage of cloud based models is that customers do not have to pay for the installation of data storage and processing capabilities for applications (Jadeja,
Abstract - Hadoop Distributed File System, a Java based file system provides reliable and scalable storage for data. It is the key component to understand how a Hadoop cluster can be scaled over hundreds or thousands of nodes. The large amounts of data in Hadoop cluster is broken down to smaller blocks and distributed across small inexpensive servers using HDFS. Now, MapReduce functions are executed on these smaller blocks of data thus providing the scalability needed for big data processing. In this paper I will discuss in detail on Hadoop, the architecture of HDFS, how it functions and the advantages.
The National Institute of Standards and Technology describes cloud storage as a model for enabling ubiquitous, on-demand network access to a shared configurable computing resources that can be swiftly accessed and released with minimal effort or service provider collaboration. It is comprised of a collection of hardware and software that allows the infrastructure of the cloud to work in a seamless, unified effort. Depending on the classification of information and the service provider the remote servers can be located within the same facility. The stored data is