- What is big data? A workable definition.
- Do big data help us gain insight?
Keywords: Alexa data fail, big data, communication, ROI, return on investment, strategy
Subscribe to our newsletter about Social media strategy, monitoring, trends and KPIslike over 5,000 other readers!
Defining the term: Big data
Big data is difficult to define because everybody seems to look at things differently.
Cox and Ellsworth (1997) made the distinction between big data collections and big data objects. The latter are sets, which were too large to be processed by standard algorithms and software on the hardware that was available at that time. Based on this definition several data sites may have to be used to do the job.
Cox and Ellsworth defined big data collections as aggregates of several data sets, such as multi-source, multi-disciplinary or stored on different sites and in disparate types of data repositories. While a single data object or data set might be manageable by itself, aggregating several of these makes data analysis a challenge.
More than 12 years later, Jacobs (2009) pointed out that any type of definition of big data was a moving target. Thanks to ever faster memory chips, what was not easily processable only a year ago might well be today. In other words, big data for a mainframe computer in 1981, might be analysed and processed with ease using a MacBook Pro in 2012.
Much of data we collect accrues during a transaction, such as a user logging into an e-commerce site. Here account data is retrieved and session information is added to a log. This allows the user to search for a product and possibly purchase something. In the purchase instance, payment details are added, including updating of user data. Such databases have been maintained for years through customer loyalty programs.
Jacobs pointed out that the challenge was not transaction processing or data storage. His reasoning was that few companies acquire such data in volumes that processing and storing them would pose a challenge. The challenge starts when we want get answers to all kind of questions from these data within seconds or minutes.
Based on the above (see also below), it is probably safe to define big data using three main features:
- Velocity: Data is produced at high speed, such as thousands of closed-circuit television (CCTV) cameras across London, UK monitoring things to help prevent crime.
- Volume: All those CCTV cameras across London produce vast amounts of visual data that must be interpreted, for instance to help solve crime (see y-axis below).
- Variety: The types of data sources. Examples are structured or non-structured data, machine vs. non-machine readable data and so forth (see x-axis below).
The biggest challenge is to match structured, machine-readable data with unstructured text or images (e.g., video feed). Examples include pictures that were not tagged with keywords.
This challenge is not new, however. Security services have tried for ages to match different data sources (e.g., telephone conversations and email) to gain valuable insights. All this is being done to better manage risks society faces when it comes to terrorist threats (e.g., Boston Marathon bombings) or hacker attacks.
References
Jacobs, Adam (August, 2009). The pathologies of big data. Communications of the ACM 52(8), 36-44. DOI=10.1145/1536616.1536632 Retrieved December 23, 2013 from http://doi.acm.org/10.1145/1536616.1536632
Cox, Michael, Ellsworth, David (May, 1997). Managing big data for scientific visualization. Paper presented at SIGGRAPH 97, August 3-5, Los Angeles, CA. Retrieved December 23, 2013 from http://www.dcs.ed.ac.uk/teaching/cs4/www/visualisation/lectures/98-99/lect16ref.ps.gz
Interesting read: Typhoon Haiyan: Twitter and Flickr to the rescue?
Bottom Line
Big data are great for measuring correlations. Unfortunately, they may not allow us to gain insight regarding:
- what factors could help explain a certain outcome, and / or
- what caused something to happen.
For instance, forecasting influenza trends using various data points including Google Flu Trends is interesting. But how much Google’s search data as a whole add to our ability to forecast trends for next winter is unclear. If forecasting fails, how will this help health policy makers put the right strategy in place? What about helping us reduce the risk for a flu epidemic next winter (see below)?
Customer loyalty programs such as those offered by Safeway, Tesco or Migros, may give us lots of data, but unfortunately they fail to give us the insights we need for marketing. This is especially true if a client does 60 percent or less of their total household shopping at your store. In turn, giving them a toothpaste discount coupon on the back of the cashier receipt sounds great. But what if they just purchased enough supplies for the next 12 months? Worse, what if this happened by taking advantage of a special at the store down the road last week? Of course, it is impossible for you to know, but the result is that your marketing efforts may come across as simply a nuisance.
Customer loyalty to shops, airlines, hotels and so forth is low these days. Big data may not give us a true picture of what is happening because for starters our data may not be giving us an accurate picture (see example above; our records show they shop for this brand and are ready for a refill – NOT). In fact, big data predictions about people’s need (e.g., toothpaste example) will be incorrect. As well predictions about their behavior (e.g., risk of joining a terrorist cell) will punish or make inferences about them inaccurately and punish them before they have acted. Besides it being a nuisance, it negates ideas of fairness and justice.
Big data can be as fickle or useless as trying to predict what fashion trends will get consumers excited next Spring in stores or be sure to flop. Unless we gain more insight (are these data accurate, valid???), we are unable to manage this risk better.
Fact is, gaining such insights may not require tons of data. Talking to a few clients to better understand why they do certain things a certain way may, however, be critical for a thoughtful analysis. And no, doing a telephone marketing survey will not get answers from those customers you want replies from (e.g., successful professionals).
Reference
Dugas, Andrea Freyer, Jalalpour, Mehdi, Gel, Yulia, Levin, Scott, Torcaso, Fred, Igusa, Takeru and Rothman, Richard, E. (February 2013). Influenza forecasting with Google Flu Trends. PLOS (The Public Library of Sciences) ONE 8(2): e56176. doi:10.1371/journal.pone.0056176. Retrieved October 28, 2013 from http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0056176
Interesting read: Business analytics: Dealing with the data deluge
Source: Big data: The latest fad?
Do you agree with my definition of big data?
Do you know about a big data fail?
What is a great big data case study that benefitted you? Thanks again for sharing your thoughts and insight – I appreciate it, as always.
Find more on Google – business analytics, big data and social media ROI
The author: This post was written by social media marketing and strategy expert Urs E. Gattiker, who also writes about issues that connect social media with compliance, and thrives on the challenge of measuring how it all affects your bottom line.
His latest book, Social Media Audit: Measure for Impact, appeared in 2013 from Springer Science Publishers. His latest about social media fashion with passion will appear in early 2014.
Connect with CyTRAP BlogRank on Google+ or the author using: Email | Twitter | Google+ | Xing