The act of gathering and storing large amounts of information for eventual analysis is ages old. But some years ago the term “Big Data” was created to define data sets that are so large or complex that traditional applications are not enough to process it.
The variety of challenges included in this term is large. It contains things like information gathering, analysis, storage and search. In general all the tasks that are related with the organization, management and analysis of the data.
We are living in a connected world, there is no question about that. Almost everything nowadays is connected to the Internet and if it is not, probably, it is going to be connected soon, ths is the tendency. We are not talking just about computers or laptops, we are talking about mobile devices, wearables, cars, home appliances… In addition, we do an extensive use of the Internet like social networks where we have our friends, our favourite things. We use different devices to control our health parameters or our activities. Calendars, contacts, schedules, searched information, readed newspapers, online shopping are a few examples of all the information we have online and reachable for some or all the companies outside. They have our profile as an individual person and as a group, what we like or dislike, what we want, what we do. It is true that it is mixed with a lot of noise, a lot of useless information, but here it is when it comes the Big Data. A way to do something productive with this information, a way to find a ROI (Return Of Inversion) for the time and the money they expend analysing the data.
Nowadays, the data growth driven by unstructured data. This is true, there are no standard formats, there are thousands of devices generating informacion growing fast in numbers (IoT), ourselves are generating huge amounts of unstructured data, and this is precisely one of the Big Data challenges.
As a more formal definition, the concept gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of Big Data as the three Vs:
- Volume: Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. In the past, storing it would’ve been a problem – but new technologies (such as Hadoop) have eased the burden.
- Velocity: Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time.
- Variety: Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.
A couple more can be added, like:
- Variability: Inconsistency of the data set can hamper processes to handle and manage it. Daily, seasonal and event-triggered peak data loads can be challenging to manage. Even more so with unstructured data.
- Veracity: The quality of captured data can vary greatly, affecting accurate analysis.
From a business point of view, any company involved in Big Data needs to:
- Collect the information: Probably from multiple sources.
- Integrate the information: All the collected unstructured information and their own information.
- Analyze the information: Extract concrete tendencies, spot business trends, find patterns, or any conclusions they want to obtain.
- Take actions or decisions: Based on the analysis.
For all the reasons exposed the systems needs to be real time, scalable and high performance systems, been not enough with the tradicional systems.
This article is just a little introduction to what is Big Data. It is planned to go deeper in this topic and in technologies related with it.
See you.