What if every piece of data had a confidence score?

Director, Product Marketing, InfoSphere, IBM

Every time you’ve applied for credit, the decision is made quickly, easily and consistently based on a key piece of data: a credit score. Have you ever thought about credit scores, and what happened before they existed? Granting credit was a time-intensive processes plagued by bias, data collection errors and inconsistent decisions. Does that sound like your big data and analytics programs today? According to research, up to 80 percent of time is spent finding, fixing and integrating data. What’s more, 12 percent of the time is spent defending data and re-validating it. That doesn’t leave much time for analyzing and using data to make better decisions. Is there a way this process could be expedited?

The need for confidence has never been greater. With the explosion of big data, organizations are exponentially increasing the confidence problem. (More data) x (uncertainty) = a greater level of uncertainty. By 2015, 80 percent of all data will be uncertain. You can already see the cracks in the foundation today. One in three business leaders don’t trust the information they use to make important decisions.

iig-calculator-300x100-border.jpgWhat if every piece of data had a confidence score? Some progressive organizations have attempted to address data uncertainty already. But even the most advanced organizations tend to address only one aspect of confidence. Most commonly, organizations may understand data lineage and use that to approximate confidence. “Where did you get that data?” is still the most common way of saying “Should I have any confidence in that data?” But there are many more factors in determining confidence.

  • System integrity. How many systems had the same data value versus being in conflict?
  • Governance. Were policies followed?
  • Correctness. Is the data validated, verified and standardized?
  • Completeness. Are records complete, and do we have a common view of master data records?
  • Secure and protected. Is the data safe from breach and data loss?
  • Currency. Is the data up to date?

In fact, those seven factors all play a role in data confidence. And research has revealed that they aren’t just qualitative—you can put a number to each of them and how they affect each other. In other words, you can calculate a data confidence score.

Just like a credit score, a data confidence score begins the journey to developing clear confidence levels for different usages of data, such as running social media sentiment analysis (no problem—your confidence needs to be at 590) or making decisions for long-term company strategy (for this usage, your confidence level needs to be 670). For any decision you need to make, you can determine how confident you need to be, and how confident you are. This is a significant breakthrough for organizations.

The black magic and art of determining confidence are gone. It’s a science. And the days of saying “We don’t even know how to begin determining data confidence, so we won’t even bother” are over. A new era of data transparency and confidence has begun.

Calculating scores is one thing, but making it part of your day-to-day business is quite another. So how do you put this into effect? Data confidence should be calculated when you encounter data, whether that is via integration, or through placement in a landing zone to assess its value. Data confidence should be calculated and then stored with the data itself; in other words, it is metadata. And when it is done as part of an information integration and governance technology, it can be done seamlessly when data is integrated, and while it is being governed. By integrating data confidence into your overall data fabric, it becomes part of your applications and processes that are already utilizing the data.

Who owns data in your organization? If you’re like most, the answer may be “Many people” or “I’m not sure.” While some organizations are investing in chief data officers (CDOs), the majority have not yet done so. And so the burden falls to you.

To determine data confidence for your organization, use this Data Confidence Calculator. Take the results and show them to any business leader who cares about or “owns” customer data in your organization.

Related resources

White paper: Integrating and Governing Big Data