Rules of thumb for identifying and prioritizing big data applications

Big Data Evangelist, IBM

It would be nice to have a crisp algorithm to pick and choose among potential big-data applications. But there isn't one.

What enterprises have instead are plenty of handy heuristics ("rules of thumb") to help prioritize among potential applications, based on factors such as business impact, feasibility, cost and the like.

One heuristic is simply to produce a brain dump of all the core big data applications. One year ago, I blogged my thoughts on hardcore big data use cases. However, at that time, I didn't propose any specific procedure for brainstorming some exhaustive master list of use cases, nor was there any decision tree implicit in my discussion. It was simply an encapsulation of the core big data use cases I saw in the world around us at that time.

Another heuristic is to start your analysis from the 4 Vs (volume, velocity, variety and veracity) point of view on big data. A few months after I wrote my blog on big data use cases, I blogged my thinking on how to measure the bottom-line contribution of big data. That wasn't an algorithm or decision tree either, but it laid out criteria for assessing the contributions of each of big data's primary dimensions to business value.

Yet another heuristic is to work from a business-value definition of big data that distinguishes it from other data-centric applications. The definition I've been using for a while is that big data is all about deriving differentiated value from advanced analytics on trustworthy data at any scale. What's useful about this is that distinguishes big data from traditional business intelligence, performance management and transactional computing, while alluding to a broad spectrum of applications that includes them all. Another thing that's useful is how you can parse each component of this definition to help frame the value discussion for big data use cases. It is most useful to do this in reverse order:

  • Any scale. Big data is all about keeping the entire population of relevant information at your fingertips, rather than just convenience samples and subsets. It's also about unifying all decision-support time-horizons (past, present and future) through statistically dervied insights into deep data sets in all those dimensions.
  • Trustworthy data. Big data is all about deriving valid insights either from a single-version-of-truth consolidation and cleansing of deep data, or from statistical models that sift haystacks of "dirty" data to find the needles of valid insight.
  • Advanced analytics. Big data is all about speeding insights through any or all of the following: detailed, interactive, multidimensional statistical analysis; aggregation, correlation and analysis of historical and current data; modeling and simulation; what-if analysis and forecasting of alternative future states; natural language processing; and interactive exploration of unstructured data, streaming information and multimedia.
  • Differentiated value. Big data is all about deriving fresh business insights from data patterns (such as long tail analyses, micro-segmentations and unsupervised machine learning) that are not feasible if you're constrained to smaller volumes, slower velocities, narrower varieties and cloudier veracities.

Then there's the approach that Vincent Granville took in his recent blog, "17 areas to benefit from big data analytics in next 10 years." He doesn't make any pretense about being exhaustive. Rather, he simply provides a list of value scenarios with no common thread other than reliance on big data and analytics.

Pulling use case ideas out of thin air is also a valid heuristic. Identifying valuable applications of big data is often more of an inductive, exploratory process than a deductive exercise.