Context Influences Confidence in Big Data
Data managers can learn a lesson in confidence from a can of tuna
When reviewing your organization’s financial results for the previous quarter, what is your level of confidence in the information? When reading a projection of economic growth posted on a social media site by someone known as Angry Dog, what is your level of confidence in that information? If you knew that Angry Dog was the pseudonym for a well-known economist who accurately predicted the last market downturn, would you feel differently about the information? How would you use any of that information if you needed to make a recommendation regarding a significant investment for your company?
We know instinctively that all information is not alike. It doesn’t all have the same value, and it isn’t all appropriate for the same uses. And when working on plans for taking advantage of the latest new data sources—whether they are machine data from continuous meter readings, social media comments, or data from new business applications—we need to assess the value and determine what is appropriate.
Is it good enough to eat?
During a recent discussion of the value of metadata—the information about data that helps in assessing its value—the conversation took an unexpected turn in the direction of canned fish.
Concerns about sustainable fishing practices have presented a challenge to those in the business of catching and selling seafood. If they are following sustainable practices, how can they make that information known to consumers? How can a shopper know that the can of tuna on the grocery shelf meets his or her own standards for sustainability?
One approach might be to consult a buyer’s guide offered by an organization focused on sustainability. But one UK-based seafood marketing company, John West, decided to take matters into its own hands. It made information about its canned fish available directly to consumers—not only through advertising and social media, but also through specific details on individual cans of seafood.
John West rolled out its Can Tracker program for tuna in 2011, and subsequently added other fish to the program. Individual cans have some lineage data printed right on the can—for example, indicating that it contains Atlantic Ocean skipjack tuna from Ghana. Cans in the program also contain individual codes that consumers can enter at the John West website to get additional details, right down to the name of the fishing boat that caught the fish. The result of this new transparency has been not just increased traffic to the John West website, but also, according to the company’s own research, a significant improvement in customer perception of the organization’s sustainable practices since the launch of the Can Tracker program.
Why all this talk about canned fish in a column about information governance? First, the fish story shows how one company recognized its information as a critical asset and took steps to share that information in a well-governed manner—by sharing previously protected information and creating a self-service tool for consumers.
Second, the story illustrates the critical linkage between lineage and confidence. The details about the history of the fish in the can give consumers increased confidence in the food they are about to eat and enable informed decision making. Similarly, the lineage of information that finds its way into an organization helps decision makers to assess its value. Confronting information from a new source, a data steward should ask where and how the information should be used within key business applications. And data lineage provides part of the answer.
How should the information be used?
What are some of the important things to know about information? The following questions are worth asking:
- Where did it originate?
- How old is it?
- Has it changed over time?
- Who or what has touched it or used it?
- Does it contain sensitive, personal details?
The answers to these and other questions matter to an analyst preparing a corporate financial report, and they may lead to the exclusion of certain data from the analysis. But the same answers may yield a different result when a marketing practitioner is preparing a customer sentiment analysis. In that context, inaccurate information that comes from questionable sources may still be appropriate for inclusion in the report as a valid indication of customer sentiment.
For data warehousing, big data exploration, application consolidation and retirement, and a whole range of different information-intensive projects, the key to answering questions like these lies in the metadata. IBM offers an easy way to access that information and put it to use. IBM® InfoSphere® Metadata Workbench capabilities in IBM InfoSphere Information Server provide a transparent window into the data that is available through a unified data integration platform. The platform offers insight into data source analysis; transformation processes, both extract-transform-load (ETL) and extract-load-transform (ELT) processes; data quality rules; business terminology; data models; and business intelligence reports.
InfoSphere Information Server also increases understanding and trust in information by showing the complete lineage, indicating where the data originated and what happens to it as it moves across data integration processes. Not only does the InfoSphere platform support compliance and regulatory reporting mandates by displaying a complete audit trail that shows how information is generated, but it also quickly determines and displays how changes to data affect processes, applications, and end users. For example, at the University of Arizona, this capability helps the institution manage approximately 22,000 transformation jobs running every night and make changes with a full understanding of potential impact across the environment.*
Data lineage is clearly a critical factor in providing the context that helps determine the appropriate level of confidence in data. What other confidence factors are critical in your business? Please share your thoughts in the comments.
* Case study: “University of Arizona Video, IBM Helps University of Arizona Deliver 90% Faster Access to Data,” June 2012.
|[followbutton username='IBMdatamag' count='false' lang='en' theme='light']|