Trusting Big Data

Make business-critical decisions confidently without delay

Program Director, Analytics Platform Marketing, IBM

Richard Hackathorn, industry analyst for business intelligence (BI) and data warehousing, first wrote about the concept of decision latency. He defined it as the time required after a business event, after the storage of data about the event, and after the analysis of the data, but before any action is taken.1 A lot has happened in the world of information management since Hackathorn introduced this concept. There have been major advances in technology for data warehousing and analysis, and the era of big data has emerged. Now that there is order-of-magnitude more data to consider, arriving with greater speed and complexity than ever before, the potential impact of decision latency is bigger than ever.

Latency cost in paradise

Decision latency wouldn’t be a particularly interesting topic outside academic circles if it didn’t have a cost. But imagine a relaxed world in which leisurely consideration of all options and slow decisions were the order of the day. For example, vacationers dining in an island paradise restaurant may take time to consider all the options on the menu, perhaps asking a few questions about ingredients and preparation methods for some of the dishes. The result may be perfectly satisfactory, no matter how long the decision making takes.

However, today’s organizations do not typically follow the patterns of this island paradise approach. Even in an idyllic setting, a delay in placing the order could carry a risk. The restaurant may run out of the key ingredient in the selected menu item, for example, or the kitchen might close before the order is taken. And in business, the delay between availability of information and analysis and any decisive action almost always has a cost, such as loss of an opportunity to a competitor or increased cost of materials for a project.

In the world of electronic trading, latency is measured in milliseconds,2 and the cost of latency can escalate dramatically in an instant. In other types of endeavors, latency might be measured in seconds, minutes, hours, or even days. But whatever the timing, delay has its costs. So why delay action?

A common cause of delay is lack of confidence in the available information. Despite major advances in the ability to understand and consume big data, the challenge of confidence remains. In so many cases, decision makers simply do not have confidence in the information required to take action.

Some important innovations to the IBM® InfoSphere® Information Integration and Governance (IIG) portfolio announced by IBM in September 2013 are poised to help organizations build confidence in their data and reduce decision latency. The innovations fall into three categories: increased automation of data integration, visual context to assist those responsible for governing data, and increased agility in governance. Look for other IBM Data magazine articles and columns featuring deeper dives into some of these advances, but for now an overview of some key capabilities is in order.

Self-service as an accelerator

The time required for defining and accessing the right data for any project is often a cause for project delay, ultimately affecting decision latency. In 2012, IBM introduced IBM InfoSphere Data Click, a self-service data provisioning capability that is included within the IIG portfolio. This capability has now grown and expanded to make it easy for business users to get data from more big data sources than they could previously, including JavaScript Object Notation (JSON), NoSQL, Apache Hadoop, and Java Database Connectivity (JDBC) sources. This expansion helps accelerate both insight and action.

Seeing is believing

Could a pilot fly an airplane without an instrument panel? Maybe, but it wouldn’t be easy. Similarly, a lack of visual context for what’s happening with data—in areas such as quality, security, and privacy—can decrease confidence in data and increase decision latency. Visual context can make a big difference in giving a chief data officer or a data steward the confidence required to make decisions or recommendations to the business. New IBM tools make it easy to create a custom dashboard that reflects the organization’s own policies and key performance indicators (KPIs).

One size for all?

A fundamental assumption of big data is that the data should be shared so the organization can gain new insights. Yet not all data should be shared, and not all data requires the same level of governance. Being able to recognize confidential information and to provide levels of protection appropriate to the data and its intended use are important. Applying appropriate governance to different types of data for different purposes requires an advanced level of agility in information management.

IBM’s recently announced capabilities help increase agility by extending data activity monitoring to additional Hadoop, relational, and NoSQL systems, including the Apache Cassandra distributed database management system, GreenPlum big data analytics, Hortonworks Hadoop distribution platform, and MongoDB document database. InfoSphere IIG now also allows masking of sensitive, personally identifiable information (PII) in multiple data stores that feed data into Hadoop projects. The data is fully usable for purposes such as analytics and testing, but the sensitive information is protected.

Bold business decisions

Building confidence in big data can go a long way toward reducing decision latency and accelerating the business benefits of good decision making. Recent IBM innovations help organizations in several of the following key areas that can increase confidence in big data:

  • Streamlining access to critical information for analysis and action
  • Clarifying the source and history of the data
  • Providing a clear picture of the quality of data
  • Making sure private information is secured and protected

To learn more about these IBM innovations for building confidence in big data, take a look at other September 2013 articles in IBM Data magazine. And if you have a question or thought to share, please post it in the comments.

1 The BI Watch: Real-Time to Real-Value,” by Richard Hackathorn, published in DM Review, January 2004.
2 The Cost of Latency in High-Frequency Trading,” By Ciamac C. Moallemi and Mehmet Sa?lam, February 2013.

[followbutton username='paulawilesigmon' count='false' lang='en' theme='light']
[followbutton username='IBMdatamag' count='false' lang='en' theme='light']