Instill Confidence Through Solid Data Quality

A holistic information governance initiative can lay a foundation for confidence in big data

Product Marketing Manager, IBM

Data in today’s world seemingly comes from everywhere—sensors, social media, videos, audio, transaction records, and global positioning system (GPS) signals, to name a few—and 90 percent of it has been created in the last two years.1 This rather sudden influx of so much data—big data—is stretching information supply chains and making them more complex and dynamic.

At the same time, big data is offering new opportunities to enhance our knowledge of customers, societies, and employees, and it is changing the way business is conducted to benefit all its stakeholders. To take on these challenges and leverage data as a strategic asset, enterprises need holistic information governance solutions to help increase their confidence in the decisions they make to grow the business and remain competitive. Otherwise, “if the data in question cannot be trusted, its value drops dramatically.”2

Holistic information governance has several aspects such as data integration, data lifecycle management, master data management (MDM), and securing and protecting data. Data quality is the critical foundation for all these aspects because it determines the reliability of data. Poor data quality leads to suboptimal business decisions and ultimately results in a loss of confidence in the data and the insights gleaned from it.

Poor data quality is an age-old challenge, and many enterprises have been successful in implementing data quality processes to manage traditional data. However, this challenge is growing in magnitude and taking new dimensions in the context of big data such as large volume, variety, existing data warehouses containing high-quality and structured enterprise data, and regulatory demands on data quality. To ensure high-quality data for data warehousing and analytics applications, and to comply with several regulatory demands for high-quality data, enterprises now need to rethink the scope of their data quality initiatives.

New capabilities for data quality

Data is evolving so quickly in terms of volume, variety, and velocity that enterprises need to carefully anticipate future processing and performance needs when planning a data quality program. A data quality function should be scalable and evolve with changing analytical needs. Also, there should be synergy between different data quality components and broad information governance capabilities—the greater the synergy, the lower the total cost of ownership (TCO).

Big data demands advanced capabilities in addition to cleansing, standardizing, and matching. Here are a few capabilities to consider before beginning a big data project:

  • Discovering data relationships: Big data is uncharted territory for many organizations with scores of data sources. In this context, being able to discover new sources of data and the hidden links between data that is spread across heterogeneous sources is critical.
  • Assessing data quality: Studies have shown that the cost to fix data defects rises dramatically over time. To quickly uncover data quality problems and fix them in the initial stages, organizations need to analyze and validate high volumes of data for proactive data quality assessment.
  • Ongoing analysis and monitoring of data quality: Quality of data in the data warehouse deteriorates when unattended, and there is also the risk of data quality in existing systems degrading when data from new sources is integrated into an enterprise data warehouse. For these reasons, data should be continuously monitored and tracked.
  • Connecting Apache Hadoop and other NoSQL sources: The Hadoop framework offers a cost-effective means to store large volumes of data. Leveraging inexpensive data storage with Hadoop while providing support for data quality is important for data residing on Hadoop data sources that is similar to the data in enterprise data warehouses.

Harnessing big data is as much about bringing change in how non-IT stakeholders participate in the data quality function as it is about building technical capabilities. Having an enterprise vocabulary that promotes a shared understanding between business and IT around data is critical when aligning technology with business and governance objectives. The better the collaboration between IT and non-IT business users, the better will be the return on investment (ROI) for data quality.

Data quality initiatives should offer self-service capabilities that are intuitive and easy to use to engage business users in the data quality effort. Equally important, organizations should create enterprise-level visibility for data quality and its impact through metrics, reports, and dashboards that ultimately help business users understand the ROI of data quality investments.

Renewed confidence in big data

The IBM® InfoSphere® Information Server for Data Quality solution helps organizations establish a data quality process for their traditional data sources and discover insights from big data. Whether businesses are getting started with information governance and data quality or want to expand to include big data, InfoSphere Information Server for Data Quality offers the flexibility to address today’s high-priority data quality problems. And it can readily scale to support big data requirements.

Big data presents a tremendous business opportunity for organizations, but paradoxically the confidence in data—and insights gleaned from it—is extremely low because so much data flows in from outside the traditional enterprise boundaries. To build confidence in big data and the insights line-of-business professionals can glean from it, enterprises should have an information governance solution with quality and governance standards for big data that parallel solutions they’ve achieved for their traditional data sources. Investing in the right solution helps organizations maximize ROI and remain competitive by turning data into a strategic asset.

1 The claim, “90 percent of the data in the world today has been created in the last two years alone,” has been widely reported in a number of sources since early 2012—including the article, “Big Data Market to Grow to USD16.9 Billion by 2015: IDC,” by Darryl K. Taft, March 2012—which attributes the statement to IBM.
2Magic Quadrant for Data Quality Tools,” by Ted Friedman, Gartner, ID: G00252509, October 2013.


For information on a framework offering a comprehensive data quality solution, see “Getting started with a data quality program,” IBM Information Management white paper, March 2012.

[followbutton username='pshosangadi' count='false' lang='en' theme='light']
[followbutton username='IBMdatamag' count='false' lang='en' theme='light']