Big Data and its Various Aspects
Before exploring big data, let’s see what data means exactly. As per Wikipedia, data refers to the “fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing”.
Big Data shares the same definition as that of data, with the only difference that it is huge in size. Big Data is not only massive, but it also has the potential to grow exponentially for an indefinite period. It can grow even to the extent where it cannot be managed or processed by using traditional techniques such as RDBMSs.
There is no benchmark concerning size to determine whether the given data is big data or not. One of the definitions for big data is —
“If the data has expanded to such an extent that now a single computing system is unable to store and process it, then we can call this data, big data.”
Today, access to the internet has become affordable, and Internet usage has grown widespread. Each online activity, such as sending emails, booking movie tickets, posting on blogs, posting reviews on an e-commerce portal, etc., generates massive volumes of data. This growing use of the Internet facilitates data generation and easy access to the generated data. Hence, we have entered a world that has become data-driven. Organisations and various industries are leveraging the power of big data to make important decisions.
Following are some important aspects of big data. This is very similar to the 4Vs associated with big data which we will cover later.
- Volume: The size of the data is huge, i.e. in the range of terabytes or even more than that.
- Rate of change: The nature of the data changes because of the changes in transactions. There could be multiple reasons supporting the changes in transactions, such as a change in business logic or a change in requirements.
- Variety: Based on the form of data, it can be broadly divided into three categories:
- Structured: Data stored in a tabular format.
- Unstructured: Data that does not have a well-defined structure, e.g. videos and images.
- Semi-structured: Data that is partially structured or a combination of both structured and unstructured data, e.g. email. E-mail is semi-structured because it has a well-defined structure containing sender address, receiver addresses, subject, message body, attachments, etc. But the content mentioned in the subject or message body is entirely unstructured.