• Business Intelligence (BI) basically analyzes the previous data to find hindsight and insight to describe business trends. Here BI enables you to take data from external and internal sources, prepare it, run queries on it and create dashboards to answer questions like quarterly revenue analysis or business problems. BI can evaluate the impact of certain events in the near future.
  • Data Science is a more forward-looking approach, an exploratory way with the focus on analyzing the past or current data and predicting the future outcomes with the aim of making informed decisions. It answers the open-ended questions as to “what” and “how” events occur.


The advantages of using HBase are as follows:

  • As you know already, HBase is built on top of the HDFS, which is a distributed file system. This gives HBase the ability to store large amounts of data and perform analytics in a short period of time. …

The HBase Data Model is made up of the following logical components:

  • Tables: HBase tables are collections of rows and columns. Basic CRUD operations, i.e., Create, Read, Update and Delete, can be performed on tables using HBase shell commands or the API.
  • Rows: Rows are a collection of column families…

Apache HBase is a non-relational distributed data store that is built on top of the HDFS and is also a part of the Hadoop ecosystem. Thus, HBase can leverage all the features that are offered by the HDFS and are available in the Hadoop ecosystem. …

NoSQL databases are of the following four types:

  • Key-value stores: Data is stored as a key along with its value. A pointer and a unique identifier are associated with every data element. Arbitrary strings are used as keys, and the value could be a document or an image. Key-value data…

The CAP theorem states that a distributed database can fulfil at most two out of three guarantees, that is, Consistency, Availability and Partition Tolerance.

Since it is not possible to fulfil all three requirements, a combination of two must be chosen for deciding what technology to use. …

The three basic characteristics of a distributed database are as follows:

  • Consistency guarantees that all the nodes of the system will return the same and the most successful data to the users.
  • Availability is when every request will receive a response with the most recent successful write.
  • A partition-tolerant system continues to work despite network partition.

Relational database management systems (RDBMS) have been the one-stop solution to all storage needs. They support Structured Query Language (SQL) to make changes to the database. Data in an RDBMS is stored in the form of tables with rows and columns. Data in RDBMSes also tend to be more secure…

Types of data

Data that is generated today is of three basic types: structured, semi-structured and unstructured.

Structured data

Structured data is often considered quantitative. It has the following features:

  • The data fields are arranged in fixed-length formats.
  • It comprises fixed data types.
  • It has the advantage of being easily stored, searched, processed and analysed.

A NoSQL database, where NoSQL stands for ‘Not only SQL’, is a distributed database. In a NoSQL database, unstructured data is stored across multiple servers (cluster of machines).

NoSQL databases have the following features:

  • NoSQL datastores can store and handle huge volumes of data. NoSQL datastores can store both structured, semi-structured and unstructured data.
  • They provide horizontal scalability, i.e., additional storage can be created easily by adding new nodes to a cluster, without taking the cluster offline.
  • They follow not a strict schema but a flexible one that can be changed dynamically.
  • They generally use commodity machines for the servers. This lowers the processing and storage cost per gigabyte in NoSQL databases as compared with that in SQL databases.


