Components of HDFS

HDFS, it mainly consists of the following components:

  • NameNode:
  • It is the master server, which runs on the Master Node in this configuration.
  • It is responsible for maintaining the metadata (data about the locations of file blocks across the cluster, ownership rights, etc.) of the different files present in the cluster.
  • It is also responsible for assigning work to the slave nodes as well as executing file system namespace operations such as opening and renaming files, other types of modifications, etc.
  • Secondary NameNode:
  • It is responsible for keeping the metadata of the NameNode.
  • The NameNode then uses this metadata to update its own metadata.
  • The metadata present in the Secondary NameNode is used for implementing(note that this is different from itself acting as the NameNode) a new NameNode in case of failure of the current NameNode.
  • Standby NameNode:
  • This is another component of HDFS and is responsible for providing fault tolerance against the problem of a single point of failure (SPOF) related to the NameNode.
  • It provides automatic failover in case an active NameNode fails.
  • DataNodes:
  • They are the slave nodes.
  • They are responsible for actually storing and processing data of the Hadoop cluster.
  • They provide access to data files when requested.
  • They also send heartbeat messages to the NameNode regularly to indicate they are alive.
  • HDFS Client:
  • It acts as an intermediate command-line tool for interacting with HDFS.
  • It has access to the Java libraries that are needed for the various applications in HDFS.
  • When accessing HDFS, it is the HDFS Client that interacts with the NameNode and the DataNodes to get the job done.




