Big Data: How Can We Measure the Risks?

Big Data Evangelist, IBM

Big data has a value, for sure, which we often measure proportionately to its magnitudes: volume, velocity and variety. But big data also has a “disvalue” in roughly the same proportion: the more rapidly we collect more data of different types, the more likely we are to be intensifying business, legal and compliance risks associated with our stewardship of that data.

To the extent that we often don’t know exactly what it contains, big data carries its own special risks, as discussed in this recent article. Though the article focuses only on unstructured data, you can generalize its observations of more complex data sets that include relational and other data types.

Where risk mitigation is concerned, you don’t have perfect knowledge of what specifically from your big-data collection might potentially be subpoenaed in future litigation. You don’t know whether all or just a tiny piece of it might be essential to corroborate your compliance with some government regulation. You don’t know what future mandates, regulations and hot-button business sensitivities will suddenly emerge and thereby expose you to new risks (e.g., based on your historical failure to manage your data effectively to address these concerns). And you don’t know whether it might contain the golden intelligence that will power some groundbreaking future product innovation.

The chief risk factor surrounding big data is not knowing the potential future downsides associated with your failure to manage it all effectively. Making the risk factors transparent for all of your big-data sets—unstructured, structured and all gray areas in between—should be a top business priority. The above-referenced article presents a rigorous approach for modeling and measuring the risks associated with unstructured data. Actually, the advice he provides could be applied to any data set: document what’s known about the data, evaluate the data’s risk factors, and quantify the dollar value of potential downside risks

My favorite discussion in the article is the “damned if you do, damned if you don’t” risks surrounding data stewardship. Undiscriminating retention carries the risks associated with what the data says, while mass deletion might put you in serious legal, regulatory, or business peril. As the author, John Montaña, puts it: “Each piece of data a company store is interconnected with many others, and the value of each is interconnected to its risk: while eliminating a data repository might have a very low risk in terms of regulatory compliance, it might have a very high risk in terms of value to the business.”

What’s a data risk-mitigation specialist to do? The Catch-22 that Montaña describes sounds like a case of analysis paralysis waiting to happen.

But it also feels like a call to action. You can’t have a perfect predictive model of how the future business and regulatory environments are going to evolve. But you can have a comprehensive data-risk mitigation program that helps you deal with new challenges as they emerge.

Risk measurement should be integral to your data discovery, governance and information lifecycle management practices. Ideally, you should have an operational model of data risks under which every storage volume, database, file, document, join, table and record are tagged with the appropriate risk and value factors (i.e., business, compliance, archival, litigation, etc.). This metadata, either generated automatically by ILM infrastructure or manually by data analysts, should follow each datum throughout its lifecycle and be modified automatically as its linkages with other data, its usage requirements and other contextual variables evolve.

Of course, what I just sketched out is a tall order, especially as big data floods our databases with more data of more types, more linkages, more usage patterns, more applications, more compliance mandates and more risk factors than any human alone can track. But that doesn’t mean that it’s infeasible. ILM, data governance and risk assessment are, by themselves, mature disciplines.

We simply need, as an industry, to converge and rethink our risk mitigation approaches to keep pace with the explosion of big data.