HDFS - Hadoop Distributed File System

Functions of a File System

Following are the basic functions of a file system

  1. It controls how data is stored and retrieved.
  2. Maintain metadata about data(files and folders).
  3. Permissions and security to view the files and folders.
  4. Manage storage capacity efficiently.

Different File Systems

Local File System vs. HDFS

Local File System

Now, Local file system runs on node level. HDFS uses local file system to store files/data on individual nodes. But it has only information of blocks in its local nodes only.


On the other hand, HDFS operates on cluster level. So, it has the information which nodes have which blocks of file to process.


You want to store a file of say 6 GB on hadoop cluster. HDFS will breakdown the file into 128MB small blocks spread and replicate it on multiple nodes. Now lets get down to node level, each node have multiple blocks and local file system of node keeps information of the blocks on its local level. It does not have any information what resides on the other node. So, HDFS comes into play and keeps track of all the blocks in different nodes and helps in processing.

Benefits of HDFS

  1. Supports distributed processing by saving data into blocks, not as a whole file.
  2. Handle failures by replicating blocks.
  3. Scalability to support future expansions.
  4. Cost-effective because it uses commodity hardware.


