Blogs

Defending Big Data Insight

Use integration and governance to establish trust

Product Marketing, Information Integration and Governance, IBM

It’s Monday morning. The executive team is sitting around the conference room table, and you are presenting the groundbreaking findings of a customer sentiment analysis using Twitter, Facebook, and a host of other new big data sources. Suddenly, the vice president at the end of the table launches into an interrogation, and the questions come fast and furious: “Where did we get this data? How can we be sure it’s accurate? We didn’t use any sensitive personal information here, did we?” And then those questions are followed by the inevitable, “I don’t trust your conclusions.”

If you’ve ever been in this position without the right answers, you’re not alone. The following account of an IBM executive's experience captures the challenge of defending an analysis:

“Prior to my role at IBM, I worked for a market research company. Our organization and comparable companies, like ACNielsen, purchased data from supermarkets, pharmacies, convenience stores, and other sources; aggregated and analyzed big data; and sold it to consumer packaged goods companies like Anheuser-Busch, PepsiCo, and Procter & Gamble. I had the good fortune of leading the company’s next-generation business intelligence (BI) software.

“One day, I did analysis for a major alcoholic beverage company and was ready to fly down to corporate and present my findings. In summary, I intended to recommend that they build, launch, and market a new beer category called Beer Brand Lemon. After setting up the meeting, I met with the internal team that had worked on the account for years and told them my exciting news. The lead on the account said to me, ‘Don’t do it. I know them, and this recommendation is completely wrong. You’ll be thrown out.’

“I froze. I thought about what the account lead had said. Up until that point I assumed that the cube of data I analyzed was good, but I did not have a lot of knowledge of how it was manufactured—that’s the term they used. I took a much more active role, and the more I uncovered, the more I was concerned about my overall analyses. Did I want to bet my career on this analysis? Ultimately, I did not present Beer Brand Lemon, but given the current prevalence of citrus-marketed beers, I have to come to my own conclusions for the confidence I had in the underlying data.”

Big data does not necessarily mean good data

While that situation isn’t wholly unexpected, many who are charged with generating big data insights are not prepared with answers. Instead of using objective data to deliver radical insights destined to alter the trajectory of corporate strategy, many executives make critical decisions based on pure intuition. In the meantime, big data teams are sent back to the drawing board—heads spinning—to determine how to restore the trust of their leadership in themselves and their analyses.

But why such caution, when big data promises so much untapped insight? There have been several very hard, and very public, lessons learned in the recent past, such as the following:

  • An organization tracking influenza trends combined algorithms with massive amounts of influenza-related search data to predict peak flu levels in 2013. Its estimate of US population occurrences was nearly twice that of the Centers for Disease Control and Prevention, and is now widely accepted as a gross overestimate of the actual occurrence rates. Although the data was timely, it was far from accurate. Decisions based on such a narrow set of parameters, even within big data, can have disastrous results.
  • In April 2013, shortly after a news organization’s Twitter account was hacked, a tweet was sent claiming an attack on the White House. This event triggered a sell-off of stocks on Wall Street. In mere moments, almost USD200 billion in market value was erased from the books. In this example, what was once a good source—the organization’s Twitter feed—quickly became a huge financial loss for many investment firms.

As clearly demonstrated in both of these scenarios, big data is not synonymous with good data. As organizations begin to rely on external big data, special care must be taken to validate its accuracy. These examples provide stark evidence that blind reliance on big data and the resulting analysis can have a devastating effect on the business—and even on health.

Given the potential downside of poor analysis and the proliferation of good and bad data, business leaders challenging big data insight should not be surprising. Yet the potential of big data cannot be ignored. So how do you balance the use of big data with the risks, including incomplete, inaccurate, or even misleading information?

Veracity as the key to success in big data analytics

Protecting ourselves from embarrassment when our analysis is called into question requires trust both in the underlying data and the methodologies used to understand and manage it. A systematic approach to information governance is the single most comprehensive framework for firmly securing trust in analytics and data. Most importantly, it gives business leaders the confidence to act on the analysis.

As defined by IBM Information Integration and Governance, “information governance is a holistic approach to managing, improving, and leveraging information to increase an organization’s confidence in decisions made—within big data and analytics, and within operational business processes.”

Information governance is critical to successful big data projects. Given the tremendous volume, variety, and velocity of data now available to organizations, determining the veracity of that information provokes data requirements that only governance can provide.

What does success look like when governance and big data combine? Consider the following scenarios in which big data analytics teams had all the right answers:

  • A chain of grocery stores in Latin America leverages IBM® Information Integration and Governance technology to integrate and standardize more than 6 TB of product and customer data. This integration enables corporate personnel to share trusted information and gain enhanced insight into operations, ultimately increasing annual revenues by 30 percent and profits by USD7 million.
  • A diamond jeweler, unable to process vast amounts of data during business spikes, deployed IBM Information Integration and Governance technologies in combination with a new data warehouse. The company now provides operational BI applications, including real-time, point-of-sale (POS) reporting, that aid customer relationship management (CRM) applications. This insight has allowed it to deliver consistent customer service that has so far resulted in a 7 to 12 percent revenue increase.

Employing governance at the start of any big data initiative enables analytic teams to avoid the pitfalls of the Monday morning executive meeting, when analysis is inevitably called into question. But it’s also much more than that. A recent study by the Aberdeen Group* reveals the following insights about organizations using governance for trusted information:

  • They are improving data accuracy at three times the rate of their competition.
  • Nearly four out of five companies with high-quality data rated their decision making as 7 or higher on a scale of 1 to 10.
  • Seventy-seven percent of companies showing high or very high levels of trust in their information had implemented information governance tools.

The IBM InfoSphere® platform is well positioned to help organizations derive business value from big data. With leading-edge technologies in both big data and governance, supported by experienced resources and best practices, IBM helps ensure that organizations are successful in their big data journey.

How much confidence do you have in your big data analytics? Share your thoughts in the comments.

*The Big Data Imperative: Why Information Governance Must Be Addressed Now,” Aberdeen Group, Inc., December 2012.

[followbutton username='jeffscheepers' count='false' lang='en' theme='light']
 
[followbutton username='IBMdatamag' count='false' lang='en' theme='light']