Introduction to HBase

Apache HBase is a non-relational distributed data store that is built on top of the HDFS and is also a part of the Hadoop ecosystem. Thus, HBase can leverage all the features that are offered by the HDFS and are available in the Hadoop ecosystem. HBase was released as an open-source implementation of Google’s Bigtable.

Google’s Bigtable is a high-performance data storage system that is built on top of the Google file system. It is a distributed storage system that is used to manage data and designed to scale to a very large size.

HBase has the following features:

  • Distributed storage: Apache HBase is a distributed, column-oriented database that is built on top of the HDFS. It allows data to be stored and processed in a distributed manner.
  • Flexible schema: HBase does not follow any strict schema, i.e., you can add any number of columns dynamically to an HBase table. HBase columns do not have any specific data type, and all the data in HBase is stored in the form of bytes.
  • Sorted: HBase records are sorted by RowKey. Every HBase RowKey must be unique, i.e., no two rows can have the same RowKey.
  • Data replication: It supports the replication of data across a cluster.
  • Faster lookups: HBase stores data in indexed HDFS files and uses HashMap internally. It also allows random access to the data. This enables faster lookup.
  • Horizontal scalability: HBase is horizontally scalable; this means that if the clusters require more resources, HBase can scale up according to the need. HBase can horizontally scale up to thousands of commodity servers.

Note: HBase is not optimised for joins since there are no relations in HBase.

Tech enthusiastic, life explorer, single, motivator, blogger, writer, software engineer