February 14, 2012, 9:47 am
Big Data in the House
By Henry Woodbury
The New York Times Sunday Review highlights Big Data. Big Data is that rapidly backfilling reservoir of web analytics and real-world sensor data. It is also a million rivulets of meandering incident, logged at its portages and tracked by its jetsam. The projected revolution starts with the ability to find meaning from it all:
Most of the Big Data surge is data in the wild — unruly stuff like words, images and video on the Web and those streams of sensor data. It is called unstructured data and is not typically grist for traditional databases.
But the computer tools for gleaning knowledge and insights from the Internet era’s vast trove of unstructured data are fast gaining ground. At the forefront are the rapidly advancing techniques of artificial intelligence like natural-language processing, pattern recognition and machine learning.
Unfortunately, the examples in the article are not inspiring. There is a difference between real scientific discovery and arbitrage opportunities and other than engineering-driven examples such as Google’s robot-driven cars, most of the focus is on arbitrage opportunities.
Case in point is the invocation of Moneyball. I am a big fan of baseball sabermetrics, and, among those paying attention, the work of Bill James and other analysts has revolutionized the way people evaluate baseball players. But this is work on the margins. It doesn’t trump the expression of true talent that anyone can spot, and it doesn’t void the enormous impact of chance. Hubris may be more dangerous than confusion:
Big Data has its perils, to be sure. With huge data sets and fine-grained measurement, statisticians and computer scientists note, there is increased risk of “false discoveries.” The trouble with seeking a meaningful needle in massive haystacks of data, says Trevor Hastie, a statistics professor at Stanford, is that “many bits of straw look like needles.”
Although “Big Data” is the today’s hot topic in the analytics world, its real value comes from its marriage with “Small Data.” Big Data tells you what happened; Small Data tells you why it happened. Analyzing Big Data may let you identify the ripples in the pool and figure out if they came from a rock or a snake, but Small Data can tell you why the rock was thrown or how hungry the snake is.
Small Data (a.k.a Qualitative Data) used to be looked down on by the Quant Jocks (no one ever talked about Qual Jocks). But Small Data can be more important than Big Data. Small Data is what you hear when you talk to real people and keep asking “why?” “Why is that better?” “What will that do for you?” “Why is that a problem?” and other questions that add insight and meaning to Big Data. Small Data is what makes the insights of Big Data actionable.
Investing in Big Data without including the Small Data that explains what it really means results in a dramatically reduced ROI and actionability.
Posted by Bob Klein on February 24, 2012 at 4:51 pm
This has been very nicely described for the clinical research domain as the “incidentalome”. See article here:
Posted by Jean Stanford on March 24, 2012 at 8:31 am