Blogs

Bad data need not apply

Program Director, Analytics Platform Marketing, IBM

“Giving business and technical users data they trust is key to good business intelligence,” according to Philip Russom of The Data Warehousing Institute (TDWI). A new e-book from TDWI, Why Enterprises Need Trustworthy Data, explores issues like why we need trusted data, how to get it and what issues arise when we try to run a business based on data that doesn’t deserve our confidence.

We should be rolling out the welcome mat to invite good data into our data warehouses, and at the same time making it clear that bad data need not apply.

We’ve all seen the scenario where businesspeople simply don’t trust the reports they receive. In that scenario, what can the users do? They can base their decisions on the questionable reports anyway, ignore the reports and make decisions that are not based on fact or try to create their own systems to deliver data they can trust, creating more data silos and perhaps more of those “shadow IT” systems that keep cropping up around the enterprise. The options aren’t pretty.Bad data need not apply

The many data warehousing challenges

In an environment where lack of trust is common, the data warehouse challenge is not just finding the best warehouse technology or the critical data that lives in silos around the organization and determining what should move to the warehouse, although those are certainly important. It isn’t even just determining how to start leveraging unstructured information as part of the analytics environment, although that is a challenge that organizations face today as they tap into the new opportunities presented by big data. 

A key challenge in the current environment is determining how to create a warehouse that instills confidence among the business users who receive the output of analysis based on the warehouse. So what does it take to build that confidence? Is it consistent data quality? Yes, that’s part of the answer.  How about transparency into the lineage of the data: where it originated, who or what has changed it, and when it was last updated? Yes, absolutely. No one should have confidence in information from an unknown or unreliable source, or information that is woefully out of date.

In fact, some information is so far out of date that it doesn’t belong in an active data warehouse at all. Instead, it should be archived for safe keeping or, at a point determined by business rules, removed from enterprise systems altogether. So it’s logical to ask another question about data in the warehouse: have we kept all the information that’s needed for compliance or for business operations, but not the data that has passed its useful life and become a liability rather than an asset? A yes to that question can increase user confidence as well.

Instilling confidence in business users can also mean providing the best 360-degree view of customers, products and other key entities, despite conflicting information flowing toward the warehouse from multiple sources. An extended 360-degree view, incorporating unstructured information from within or beyond the enterprise, would be even better in building confidence.

Did you lock the door?

And then there’s the question about the security of data in the warehouse. How is it being protected from both accidental leaks and intentional breaches? Is sensitive information masked from eyes that don’t need to see it? Is access to the data monitored so that deviations from business rules can be detected? Protected data instills user confidence.

All these questions are critical to the design of a modern data warehouse ecosystem, and they all point to the importance of information integration and governance.

“If users don’t have confidence in the data of a data warehouse and other BI data stores, they may argue over the data’s accuracy, refuse to use the reports and analyses fed from the data or build their own data stores,” says TDWI’s Russom. To increase the success rate of data warehousing and BI projects, design the environment to make bad data unwelcome but to make good data feel right at home.

For further reading