How Data Insight–Driven Enterprises Manage Data Onslaught

Organizations need to harness the right data to successfully move toward enhanced data-driven decisions

Program Director - API Economy, IoT and Connected Cloud solutions, IBM

Connecting everything to the Internet—the Internet of Everything—brings an interesting problem to the forefront: data onslaught. One example of data onslaught in today’s data economy is the 2.5 quintillion bytes of new data collected every single day.1 Another example is the 2.5 PB of data collected by a major retailer every hour.2 And by 2015, 1 trillion devices are expected to be generating data.3

A key point that almost every organization seems to miss in the data economy is that just because they are collecting so much data doesn’t mean they are collecting the right data, or even enough data. They may be either collecting very little of something very important or not collecting the right data at all. Even more appalling are situations in which organizations collect huge amounts of data and do absolutely nothing with it. People often make the mistake of connecting value with voluminous data.

A major reason why these situations with data are happening is because the digital society has figured out how to effectively collect massive amounts of data—from the Internet of Things, devices, sensors, and so on—and efficiently process and store it as big data. However, many organizations are still trying to figure out how to get meaningful insights from that immense amount of data.

As they move on to build data-driven enterprises, insights into collected data are needed. To effectively predict the future, based on past events, organizations need to undertake deep learning of their data and events. If they don’t, they will end up in a garbage-in, garbage-out (GIGO) situation.

Scaling to the data onslaught

Essentially, this GIGO situation means organizations need to analyze the data when it is collected—stream analysis—and at a later time—data mining—to identify patterns based on circumstances, or pattern recognition. They also need a solution that can scale to the data onslaught. This point is where many in-house business intelligence (BI) systems tend to fail. They are not only unable to scale to this volume of data, but they also cannot do dynamic analysis of stream data. If an organization waits for 24 hours or a week to analyze its data, it will be too late.

A related issue is that the data is often too disparate, and many BI solutions look into a narrow scope to analyze it. These solutions cannot see the big picture, and they often cannot handle machine data that is too diversified, sequenced, or in multiple formats.

Cognitive analytics alternatives such as those provided by the IBM Watson™4 platform and/or machine-learning solutions such as BigML5 help organizations face these challenges. They are complementary solutions that can be deployed to help solve the big puzzle that data onslaught creates.

One problem with machine data or sensor data is that it is normally difficult to interpret, because it is highly cryptic and very voluminous; plus, putting the machine data out of sequence can pull results that are out of whack. The lack of industry standards only exacerbates the situation, and this scenario makes it difficult for humans and for existing systems to make any sense of such data. However, when it comes to machine learning, if a machine can produce it, then a machine can learn from it, analyze it, and get insights out of it. This situation is where human data scientists tend to fail.

Matchmaking based on action and data insights

Consider an interesting use case from the recent Gigaom Structure 2014 conference.’s senior vice president of analytics spoke about how the organization used analytics and machine learning for matching people. has approximately 15 years of very sensitive and deep personal data, with millions of samples to choose from.

First, it tried to apply basic psychology for matches—such as the concept of human behavior presented in John Gray’s book, Men Are from Mars, Women Are from Venus (HarperCollins, 1992). Then it tried the concept of Pavlov’s classical conditioning. But the organization didn’t have much success. It then went back and re-analyzed successful relationships and conducted a deep-learning exercise to find out the success formula that would ultimately yield successful outcomes. Specifically, figured out that building models based on what people say—their wants—is not enough. In other words, it determined that users’ actions and their actual needs are totally different from their wants. Ultimately, was able to predict user behavior based on users’ actions on its website with an enhanced success rate.

For example, people rated the income criterion as their number-one choice in a perfect match. However, when it came down to selecting a partner, many accepted someone who made far less money—essentially accepting the fact that they were not gold diggers, but they were there to find that perfect match. However, even though they indicated they were looking for nonsmokers as the last criterion when it came to picking a partner, people drew a line in the sand for that parameter. They rejected an otherwise would-be perfect match every single time, if that match was a smoker.

After adjusted its algorithms based on its customers’ actions instead of basing them on their profile listings for a perfect match, predictiveness of its algorithms increased two to three times.6 This result is an incredibly powerful testament to the strength of machine learning. It shows that when it comes to matchmaking, historical data can be more successful in choosing a match than trained relationship counselors, PhDs, or even one’s own preferences.

Now consider bringing this premise to the business world. If an organization applies machine learning to what its customers say alone, the organization may not get the perfect results it is going for. However, organizations that apply machine learning to customers’ actions—rather than just their spoken words—can successfully predict their business future and customer needs.

Visualizing successful outcomes

Thomas J. Watson Sr. of IBM once said, “Analyze the past, consider the present, and visualize the future.” Approximately one century later, the world is now in a situation to do exactly that. Organizations can mine big data, dynamically update predictive models with the stream of current and live data, and be able to foresee the future. These times are certainly good times.

Please share any thoughts or questions in the comments.

1What Is Big Data? Big Data at the Speed of Business website at
2Big Data and Walmart,” D.A.T.A. blog, August 2014.
3Analytics for the Modern Digital Enterprise,” Enterprise 2014 presentation, Doug Balog and Nils Brauckmann, October 2014.
4 IBM Watson page on the IBM Smarter Planet website.
5 BigML machine learning and predictive analytics applications website.
6Machine Learning’s Impact on Business Models and Industry Structures,” Gigaom conference panel discussion, including Amarnath Thombre, senior vice president of analytics at, YouTube video, September 2013.