Shine a Light on Big Data
Understand big data context for decision making that vastly enhances business performance
The potential of big data is a huge draw for organizations looking for insights from fresh sources of information through social media, automated meter readings, and other channels. Not only is there the prospect to synthesize big data into information that can provide the insight for making intelligent business decisions, but there are also plenty of chances to apply creativity to interpretation of the data and move the organization in new directions. But the value of big data is diminished if business users lack confidence in the data. By not addressing this challenge head-on, much of the promise of big data can be lost.
Business users who consider relying on new data for critical decisions and actions begin with questions such as the following:
- Do I really understand the source of this information?
- Does that context make sense given the way I intend to use the information?
- Are we properly protecting sensitive personal information in the data?
To address those concerns and increase confidence in data, the application of information integration is needed along with a level of governance that is appropriate to the data itself and its intended use.
Information integration and governance (IIG) should therefore become a natural part of big data analytics projects. It should provide automated discovery, profiling, and understanding of diverse data sets to offer a comprehensive context for making informed decisions. In addition, IIG should provide the agility to accommodate a wide variety of data and seamlessly integrate data with diverse technologies, from data marts to Apache Hadoop systems. IIG also needs to automatically discover, protect, and monitor sensitive information as an essential part of big data applications.
Visualizing data context
Given the high volume, velocity, and variety of big data, a key reason for a lack of confidence in the data is often the absence of a clear understanding of its context. The source of the data and its history are often unclear, and how the data measures up to the organization’s own data quality rules can be ambiguous. Obtaining a comprehensive view of information governance policies and an indication of how the current data is performing against those policies can also be difficult. Advanced IBM® InfoSphere® IIG portfolio capabilities—including some announced in September 2013—have been developed to address the uncertainty that can challenge organizations and to help increase end-user confidence in big data by bringing data context out of the shadows and making it clearly visible.
A flexible information governance dashboard is one tool organizations can use to achieve enhanced understanding of big data through clarification of context. It can clearly display business-driven governance policies and rules along with current and historical results from various sources—in a view tailored to meet the specific requirements of the organization. For example, end users may be able to view data integration, data quality, master data management, and data lifecycle–related metrics in a single view. In addition, users can easily drill down into further detail and decide on appropriate action. Taking advantage of InfoSphere capabilities to build customized dashboards and views, business partners as well as individual user organizations are designing dashboards to address specific needs.
The ability to find the right data is a key challenge when collecting data from a big data repository. Even when the big data is stored in one system, such as an Apache Hadoop landing zone, the sheer volume and variety can make finding specific data highly challenging.
The Big Data Catalog planned by IBM is designed to simplify the process that enables end users, data scientists, and other business analysts to peruse data. It is expected to ingest and store metadata from every available source, and it will classify data by such factors as origin, lineage, and potential value. The Big Data Catalog is also planned to make it easy to search and find data through user interfaces or service-oriented architecture (SOA) application programming interfaces (APIs). The Big Data Catalog is designed to help users shop for the data they need for various projects, so they can search, find, and leverage big data more quickly than ever before.
These newly announced and planned capabilities are not the only ones that help organizations build confidence by understanding the context of their data. Other InfoSphere IIG capabilities that follow can build confidence with enriched data context:
- Metadata, business glossary, and policy management: Defining both metadata and governance policies with a common component used by all integration and governance engines is a critical task. InfoSphere Business Information Exchange contains capabilities for data discovery, metadata management, governance policy definition and management, and governance project blueprint design, as well as a business glossary of terms and definitions.
- Data integration: The InfoSphere IIG portfolio offers multiple integration capabilities for batch data transformation and movement, along with operation performance monitoring to help end users quickly identify any problems with data processing jobs as they are running.
- Data quality: InfoSphere Information Server for Data Quality parses, standardizes, validates, and matches enterprise data, and provides visualization tools to capture and manage data quality problems when they arise.
Visit the IBM Information Integration and Governance website for more information on InfoSphere and the IBM approach to IIG, and be sure to read the other September 2013 articles in IBM Data magazine for more details on recent InfoSphere capability enhancements.
“Understanding big data so you can act with confidence,” IBM Software e-book, IBM Corporation, July 2013.
|[followbutton username='IBMdatamag' count='false' lang='en' theme='light']|