preview

How Hadoop Is An Apache Open Source Software ( Java Framework )

Better Essays

Hadoop is an Apache open source software (java framework). It runs on cluster of commodity machines and provides both distributed storage and distributed processing of huge data sets. It is capable of processing data sizes ranging from Gigabytes to Petabytes.
Architecture :
Similar to master / slave architecture.

The master is the Namenode and the Slaves are the data nodes. The Namenode maintains and manages blocks of datanodes. They are responsible for dealing with clients requests of data. Hadoop splits the files into one or more blocks and these blocks are stored on datanodes. Each datanode is replicated to provide fault tolerance characteristic of Hadoop. Default replicator factor is 3 which can be configured.
Components:
1. Hadoop Distributed File System (HDFS): It is designed to run on hardware which are less expensive and the data is stored in it distributed. It is highly fault tolerance and provides high throughput access to the applications that require big data.
2. Namenode: It manages the system namespace. It stores the metadata of data blocks. The metadata is stored in the form of namespace image and edit logs on a local machine. The namenode knows the location of each datablock on the data node.
3. Secondary Namenode: It is responsible for merging the namespace image and edit log to create check points. This would help namenode to free up memory taken up by edits logs from the start to the latest check point.
4. DataNode: Stores blocks of data and

Get Access