Confidence in Big Data
Confidence (aka Veracity) in Big Data is one of the central themes that has emerged recently in the Information Governance community. It is based on the growing awareness that when using Big Data there may not be the Confidence in its quality, lineage or accuracy that exists with more conventional data sources that have been managed and conditioned throughout their lifecycle before being made available in a Data Warehouse, Mart or Other Repository (ODS, etc.).
In order to overcome this emerging “Crisis of Confidence” a more robust Information Governance approach must be embraced. To be successful it must be inclusive of the following core components;
Automated Integration: Well-defined integration patterns & business-driven tools that quickly make Big Data streams available for analysis and decision support. Embedded within these automated integration schemes is comprehensive metadata and appropriate privacy protection constructs. “Point & Integrate” has evolved to be the desired approach by most Analytical & Decision Management Teams that I encounter these days.
End-to-end Lineage: Business users must believe that the Big Data streams they are working with are well characterized in regard to where they were sourced from, what (if any) transformations or changes have occurred prior to them being made available for analysis and what do they represent in terms of potential insights. Visual Analysis tools make these understanding more profound for analysts and business users. This line of sight into lineage provides a major confidence booster in respect to the veracity of the data stream and allows Analysts and Decision Managers to make adjustments to their models and outcomes based upon their knowledge of information lineage.
Agile Governance: Information Governance programs must adapt to the needs of Big Data while they mature altogether. Governance must become more Agile in its approach and raise its game if it is going to successfully provide oversight and enablement to the Business in its use of Big Data alongside Conventional Data Sources. Many Information Governance organizations today are still in the early stages of maturity and need to accelerate this process in order to serve their organizations emerging needs. They must also become “Agile to the core” in respect to how they approach each activity within the Information Governance domain. And be able to deliver value more quickly to those they support in the business and IT domains. Adopting and Agile approach will benefit all Consumers & Analysts of information regardless of size.
In my viewpoint Big Data should neither be separate from your core Information Governance program, nor treated as an outlier within it. Successful Information Governance should accommodate all forms of Information (Big or Small; Structured or Unstructured; Digital or Paper). Information Assets will always come in all sizes and shapes and the measure of a mature and well thought out Information Governance program is one that treats these assets accordingly based on both the value and the risk(s) that they represent. Information Governance is hard enough to make successful and pervasive, we do not need to have splintered activities for one type of information asset versus another.