Hardcore Big-Data Use Cases: Better Results at Extreme Scale

Big Data Evangelist, IBM

Big data is not a religion. Rather, it’s an analytics paradigm that enables business outcomes beyond what can normally be achieved at lower volumes, velocities and/or varieties of data.

Scale is everything. Big data means many things to many people, but at heart most of us agree that the “3 Vs” of extreme scale—petabyte volumes, real-time velocities and multistructured varieties—describe the paradigm’s core focus. To justify your investment in big-data technologies, you must have a clear sense for which analytical use cases can best achieve their objectives at greater scale.

So what are the hardcore scale-intensive use cases for big data? I’d like to propose several categories of use cases that are in big data’s sweet spot, and for which “small data” approaches, such as traditional business intelligence, are not well-suited. At a high level, I will characterize each of these in a way that does not necessarily constrain them to a specific data domain (e.g., customers, finances, products) or specific industry (e.g., finance, telco, energy), though I offer a bit more details on applications in customer analytics than other domains. You may notice some practical overlap in the analytics applications that each of these use cases supports, owing to the fact many big-data initiatives require multiple approaches. For the sake of this discussion, I’ll gloss over the question of which big-data approaches—MPP EDW, Hadoop, NoSQL, stream computing, in-memory, etc.—are best for each use case.

Without further ado, the hardcore big-data use cases are:

  • Whole-population analytics: This refers to any application that requires interactive access to the entire population of analytical data, rather than just to convenience samples, subsets or slices. Until big data came our way, few data scientists have had the luxury of being able amass petabytes of data on every relevant variable of every entity in the population under study. As the price of storage, processing and bandwidth continue their inexorable decline, computational analysts will be able to keep the entire population of all relevant data under their algorithmic microscopes. Over time, as the world evolves toward massively parallel approaches such as Hadoop, we will be able to do true 360-degree whole-population analysis. For example, as more of the world’s population takes to social networking and conducts more of its lives in public online forums, we will all have comprehensive, current and detailed market intelligence on every demographic available as if it were a public resource.
  • Microsegmentation analytics: This refers to any application requiring fine-grained segmentation of entities described in the underlying data sets. For example, when you have whole-population customer data sets you can do fine-grained micro-niche segmentation. Being able to drill into the entire aggregated population of, say, customer data, including rich real-time behavioral data, enables you to do more fine-grained target marketing, nuanced customer experience optimization context-sensitive next best action. Storing petabytes of data and having it accessible in real time means you can gain an “X-ray view” of what’s going on inside their heads, thereby supporting segmentation by sentiments, propensities and experiences. Also, if you have ample detail on all the inventory you carry and everything that customers have requested, no matter how seemingly unpopular, you can do powerful long-tail analysis on overlooked product niches of keen interest to specific customer segments.
  • Behavioral analytics: This refers to any application requiring deep data on the behavior of entities (e.g., humans, groups, system components) and the relationships among them. Social graph analysis is the most well-known example of behavioral analytics. In the enterprise, social graph analysis powers anti-fraud, influence analysis, sentiment monitoring, market segmentation, engagement optimization, experience optimization and other applications where complex behavioral patterns must be rapidly identified. Graph models are powerful enablers for fine-grained predictive modeling of human behaviors because they help identify the likely behaviors of individuals in their fuller context of groups, relationships and influence. These models offer microscopically detailed views of the customer experience by focusing on human actions and interactions.
  • Unstructured analytics: This refers to any application that analyzes a deep store of data sourced from enterprise content management systems, social media, text, blogs, log data, sensor data, event data, RFID data, imaging, video, speech, geospatial and more. The sheer size of unstructured formats, compared to structured relational data, makes managing it a big-data core use case from the word “go.” To the extent that data is unstructured, data scientists must rely on some combination of manual tagging, natural language processing, text mining, machine learning and other approaches to extract the semantics of the content.
  • Multistructured analytics: This refers to any application that requires unified discovery, acquisition, storage, management and analysis of all data types, ranging from structured to unstructured. For example, customer influence analysis often needs to mine unstructured social media alongside semi-structured call-center logs, structured transaction data and various geospatial coordinates. These and other data sources can help you build a more powerful relationship graph model for behavioral segmentation. They can also help you gain a deeper appreciation for customer awareness, sentiments and propensities.
  • Temporal analytics: This refers to any application that requires a converged view across one or more time-horizons: historical, current and predictive. Queryable archives and longitudinal, time-series analysis fall under this heading, as do complex event processing and predictive analysis. These applications require a big data platform with the storage and horsepower to process compute- and data-intensive workloads. A common big-data can provide a correlated rollup of past, present and future for powerful decision support and automation scenarios. For example, businesses require a 360-degree view of the world through the customer’s eyes that is updated moment-to-moment. Ideally, you’ll need to roll up a unified view that combines everything you already know about the customer with everything new that you can glean from their real-time online behavior, plus everything that you can predict about their likely behavior under various future scenarios. Also, multichannel customer experience optimization applications require decision automation infrastructure that leverages historical transactions, real-time portal clickstreams, and predictive behavioral models to support continuous tuning of customer interfaces and interactions.
  • Multivariate analytics: This refers to any application that requires detailed, interactive, multidimensional statistical analysis, and correlation that requires a big data platform that can execute these models in a massively parallel manner. Regression analysis, market basket analysis and other mainstays of advanced analytics all fall into this category. This use case refers to any business problem that requires any or all of the following: detailed, interactive, multidimensional statistical analysis; aggregation, correlation and analysis of historical and current data; modeling & simulation, what-if analysis and forecasting of alternative future states; and semantic exploration of unstructured data, streaming information and multimedia.
  • Multi-scenario analytics: This refers to any application requiring you to model and simulate alternate scenarios, engage in free-form what-if analysis, and forecast alternative future states. These require a big data platform that supports fluid exploration without needing define data models up front. Some of the more sophisticated data-science initiatives involve building complex models of multiple, linked business scenarios across different business, process and subject-area domains, using such key features as strategy maps, ensemble modeling  and champion-challenger modeling. You may need to develop models against multiple information types, including unstructured content and real-time event streams, while leveraging state-of-the-art algorithms in sentiment analysis and social network analysis. You may also build models for multiple scenarios in a real-world experimentation program that requires continuous A/B testing.
  • Sensor analytics: This refers to any application that requires automated sensors to take measurements and feed them back to centralized points at which the data are aggregated, correlated and analyzed. This use case often goes by the name “Internet of Things,” referring to the embedding of continuous Internet connectivity and addressability into a growing range of human artifacts, into the natural world, and even into our smartphones, appliances and physical persons. Some call it the “RFID Internet,” referring to the increasingly prevalent embedding of wirelessly accessible digital identities into every component, subassembly and product within online supply chains. Still others call it “machine-to-machine (M2M) Internet,” referring to the new world of never-ending process optimization that leverages real-time sensor grids, telemetry, automated feedback remediation loops, embedded rules engines, self-healing network-computing platforms, and other analytic-driven systems. We find sensor analytics in medical monitoring, traffic management, hazard protection, emergency response, security incident and event monitoring, and many other critical real-world applications.

What have I left out? I’d love to get your feedback so we can build a more exhaustive taxonomy of big-data use cases.

To find out more about managing big data, join IBM for a free Big Data Event

To learn about high-level business use cases for big data, listen to this podcast, "Top 5 Big Data Use Cases." Eric Sall, vice president of product marketing at IBM, describes the key use cases that hold high potential value for many organizations.