Demystifying Big Data

in #technology7 years ago (edited)

Being on several conferences and fairs this year, I came to realize that a lot of people have quite an arbitrary understanding of what is meant by Big Data. The term becomes a buzzword and lots of people use it in their pitches when they explain their business models.

Some talk about Big Data as all the different customer data that they collect. Some others are using the term when talking about streaming real-time data into their platforms. In less sophisticated environments Big Data is broken down to a specific number like 100 Terabytes or 1-2 petabytes. However, what is Big Data now specifically?

Data is creating new jobs and changes our economy. A study by McKinsey Global Institute predicts that by next year the U.S. alone will fall short of nearly 200.000 jobs in this sector. Hence, the need for knowing Big Data is increasing.

This article shares a more scientific definition and understanding of Big Data. In general, we could say that the term Big Data is evolutionary. With the constant increase in computational capabilities, the amount of data that is handable is also changing.

Historically, a one terrabyte data warehouse used to be big data. However, nowadays have data warehouses that re able to store petabyte of data. We have analytic tools that can handle huge amounts of data.
In general, Big Data always expressed some sort of overwhelming amount of data. Something that can not be handled by common information technologies. During my studies I came across a definition I am more fond of.

The 3-5 Vs of Big Data

The 5 Vs are more accurate in describing this overwhelming amount of data. They characterize Big Data in a way that makes it easier to understand when data becomes big.

Variety, Volume and Velocity

The main three are variety, volume, and velocity. Variety describes the different forms this data can have. We have unstructured and semi-structured data that became as strategic as traditional structured data.
Volume referes to all the different sources that data can come from. All of these streams need to be captured and they need to be stored over longer periods.
Finally, the velocity describes the speed that this data is coming in. If we think about machine data, like for example sensor data of cars, we get new inputs every millisecond.

The three Vs are explained by Hugh Watson in his “Tutorial: Big Data Analytics: Concepts, Technologies, and Applications.

The two additional Vs

While the three previous vs describe the basic characteristics of this data. The following two Vs are explaining the importance of filtering and handling the data appropriately.
Even if we want to capture everything we can, we still need to ensure that we can trust the data source. Hence, the Veracity of the data needs to be ensured.
Additionally, we need to be able to extract strategic information from this data. Ultimately we need to provide Value.

Please leave a comment with your thoughts and ideas.
My series of posts is about making you think a little deeper about every day concepts. I look forward to having you follow along and reading what you throw at me.

Peace!


Twitter: @tkronsbein

Instagram: @tizian_kronsbein

Website: www.tiziankronsbein.com

References:

Watson, Hugh J. (2014) "Tutorial: Big Data Analytics: Concepts, Technologies, and Applications," Communications of the Association
for Information Systems: Vol. 34, Article 65.
Available at: http://aisel.aisnet.org/cais/vol34/iss1/65

Manyika, J., M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A.H. Byers (2011) “Big Data: The Next
Frontier of Innovation, Competition, and Productivity,” McKinsey Global Institute, May.
http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation

Sort:  

So it is more about usefulness of data when its size.

Hey @cron
absolutely. I mean everyone can collect data. However, retrieving information and building or improve products and services based on this information is what is the holy grail!

great and informative post
thanks for sharing and keep up the hard work

Thank you @hauntedbrain
I will. I hope you follow my journey. I will check out your profile as well

Hi, thanks, trying to get my head around some of this... do you see the jobs created in big data as separate from analytics, or do they crossover?

The spectrum is huge. I think there are seveal intersection. If you go for analytics of traffic data or anything real.time based it is likely you have to deal with big data.
However, it really comes down to what you want to do and what happens under the hood. You can do Analytics without getting in touch with it. Rather it would have be steps that are coming before you start your analytics.

You are welcome. If you have more questions or if I can help you with anything related to that topic please let me know.

Congratulations @tkronsbein! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of comments

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

Coin Marketplace

STEEM 0.19
TRX 0.13
JST 0.029
BTC 60880.32
ETH 3371.93
USDT 1.00
SBD 2.52