The Challenges of Transparent Accountability in Big Data Analytics

Big Data Evangelist, IBM

Many of my waking hours are spent explaining to people that “big data” is not as opaque and mysterious a concept as they’ve been led to believe. To the extent that I can hold their attention for a detailed technical discussion, I can alleviate their concerns that it might all be smoke or mirrors or, even worse, some arcane black magic.

In the past year, big data has entered the popular consciousness in an amazing way. Many terms that practitioners have used for decades now trip off the tongues of people who literally overheard them for the first time yesterday. In the process of being acquired by new speakers, many big data terms of art have picked up new connotations that may or may not be grounded in reality. Increasingly, we see many people imbuing otherwise  neutral technical terminology with undeservedly negative associations.

The word “algorithm” is a perfect example. Depending on who you ask, the word may have a positive, neutral or negative connotation. For computer scientists and data analytics professionals, algorithms are simply tools for engineering the controlling logic that informs their creations. However, for those people who wax poetic about the magic of modern information technologies, algorithms are often portrayed as some sort of secret sauce that sustains it all. Or, if you’re one of those who suspect that so-called “technocrats” are pulling all our strings behind the scenes, an algorithm might feel slightly sinister.

Most educated people know that an algorithm is simply any stepwise computational procedure. Most computer programs are algorithms of one sort of another. Humanity’s embrace of computerization makes algorithms as ubiquitous as the air we breathe. Embedded in operational applications, many algorithms are engineered to make decisions, take actions, and deliver results continuously, reliably and silently. Many of the most complex algorithms are “authored” by an ever-changing, seemingly anonymous cavalcade of programmers over many years.

People naturally grow nervous when some human artifact—such as an algorithm—seems to have taken on a life of its own. Algorithms’ seeming anonymity—coupled with their daunting size, complexity and obscurity—presents the human race with a seemingly intractable problem: who, if anybody, can personally be held accountable for the decisions they make? Put another way: how transparent is the authorship of the data and rules that drive algorithmic decision-making processes? Or, put yet another way, how can public and private institutions in a democratic society establish procedures for effective oversight of algorithmic decisions?

From a management theory perspective, the accountability issues of algorithmic decision making are related to the issues of personal accountability—or lack thereof—in complex modern organizations. Much as complex bureaucracies tend to shield the instigators of unwise decisions, convoluted algorithms can obscure the specific factors that drove a specific piece of software to operate in a specific way under specific circumstances.

Lack of transparent accountability for algorithm-driven decision making tends to breed conspiracy theories. These sorts of concerns underpin the recent article “Rage Against the Algorithms.” That particular discussion focuses on the algorithms that filter, rank and display online reviews (of restaurants, movies, etc.), which often have direct monetary impacts (positive and negative) on the subjects of those reviews. The author, Nicholas Diakopoulos, expressed the broader concern succinctly:

“This is just one example of how algorithms are becoming ever more important in society, for everything from search engine personalization, discrimination, defamation, and censorship online, to how teachers are evaluated, how markets work, how political campaigns are run, and even how something like immigration is policed. Algorithms, driven by vast troves of data, are the new power brokers in society, both in the corporate world as well as in government.”

These balance-of-power concerns also inform the concepts of “transparency paradox” and “technological due process” as described in this recent Stanford Law Review article.

And they’re akin to the concerns I discussed in my recent LinkedIn post on the increasingly opaque statistical models that drive decision automation by recommendation engines. Per that discussion:

“the more dimensions of individual preference/taste that your data scientist attempts to capture in their decision-automation model, the more complex the model grows. As it grows more complex, the model becomes more opaque, in terms of any human (including the data scientists and subject matter experts who built it) being able to specifically attribute any resultant recommendation, decision, or action that it drives to any particular variable. Therefore, in spite of the fact those ‘values’ (aka ‘numbers’) ultimately yield accurate recommendations, the number-driven outcomes become more difficult to understand or explain.”

At a very high level, I attempted to address this conundrum through the principle of “open governance,” as discussed in this recent IBM Big Data Hub blog. As one approach for giving this principle some practical teeth, I’m impressed with this idea put forward by Diakopoulos:

“Given the challenges to employing transparency as a check on algorithmic power, a new and complementary alternative is emerging. I call it algorithmic accountability reporting. At its core it’s really about reverse engineering—articulating the specifications of a system through a rigorous examination drawing on domain knowledge, observation and deduction to unearth a model of how that system works.”

That’s really provocative, but I have practical reservations about it. Many algorithmic applications are an unfathomably complex assemblage of human- and machine-authored artifacts: programming code, data structures, business rules, statistical models, etc. Exactly how “transparent” can millions of lines of code truly be? If “accountability reporting,” per Diakopoulos’ definition, requires a seven-nation army of techno-accountants to devote thousands of person-hours per algorithmic system, how feasible can it be? And who exactly has the financial resources needed to tackle this mind-bogglingly tedious project?

Oh, yes, and Diakopoulos discusses this, there’s the tiny issue of proprietary intellectual property. Many real-world algorithms are indeed the secret sauce in somebody’s business operations. Or in some country’s national-security and anti-terrorism apparatus.

Good luck bringing “transparency” to the latter. Clearly, there are many legitimate concerns that will constrain the degree of transparency that organizations of any sort—public or private can bring to their big-data-powered decisioning processes.

Likewise, every organization—no matter how ostensibly open they are on such matters—must deal with the transparency challenges that come from running the business on extraordinarily complex big-data data stores, analytic models and embedded algorithms.

None of this is a black art. But the practical challenges of documenting this sometimes-convoluted process logic can make it seem otherwise.