Single Version of the Truth: Ground it in Data Science, Not Data Faith

Big Data Evangelist, IBM

Is data a religion?

I think that’s a ridiculous notion, but it has recently gained credence in the popular mind. Some people seem to believe that a powerful elite regards data-driven management as an absolute faith. Here, for example, is a Washington Post article arguing that the current president of the United States pays homage at the altar of data. What caught my eye was this passage: “Belief in the clarifying power of data is its own kind of faith, and it is one Obama has embraced, even before winning the presidency.”

religion_world.pngI doubt that Obama’s that naive, but this “data faith” meme is worth examining. If you consider data a religion of sorts, then the concept of a “single version of the truth” as a holy grail follows naturally. So does the notion that there’s a high priesthood of data, whom some refer to as “technocrats.” This latter term, usually a pejorative, typically refers to influential individuals who would rather deal with numbers, statistical abstractions, and advanced technologies than with real people.

Increasingly, I’m seeing the term “data scientist” tarred with the “technocrat” brush. The potential for confusion certainly exists. A data scientist is a professional who explores big data, builds and tests statistical models, and wields advanced analytic tools for a living. In other words, a data scientist is a practitioner of the analytical modeling arts. By contrast, the so-called technocrats use the work of data scientists to justify the policies, programs and practices of some powerful elite.

Let’s not gloss over this distinction. Data science is a profession that can be used for good or bad aims, or anywhere on the moral spectrum between those extremes. But data scientists rarely exert direct power. And data scientists generally don’t subscribe to some elitist conceit that they’re smarter or more virtuous than anybody else (unless they were bitten by the Ayn Rand tick). However, in the popular mind, the so-called technocrats (who never call themselves that) are seen as having an arrogant faith in data, never mind the human consequences.

Another recent article took a slightly different tack, equating these so-called technocrats—and data scientists by association—with dictators (to its credit, at least it didn’t bring in the hackneyed Orwellian Big Brother analogy). In the piece, the authors joined the long line of commentators who have reproached the late defense secretary Robert McNamara for his data-driven management of the USA’s Vietnam War fiasco.

The following excerpt is typical of the longtime rap against “technocrat” McNamara: “Only by applying statistical rigor, [McNamara] believed, could decision makers understand a complex situation and make the right choices. The world in his view was a mass of unruly information that—if delineated, denoted, demarcated and quantified—could be tamed by human hand and fall under human will. McNamara sought Truth, and that Truth could be found in data. Among the numbers that came back to him was the ‘body count.’”

The article goes on to rehash McNamara’s prior experience as a statistics-infatuated “Whiz Kid” executive in the U.S. auto industry and brand him as a data “fetishist.” It then widens the indictment to the breaking point by accusing today’s Google-infested business world of worshipping the false god of big data. Clearly, the recent controversy about government surveillance, involving big data, has built the hysteria on this topic to a fever pitch in popular culture.

The article’s slam against McNamara is a bit over-the-top. But more fundamentally, it misses the point about the basic data issue that underlay McNamara’s failed efforts to manage the Vietnam involvement. McNamara’s problem wasn’t that he was necessarily a data maniac. Rather, as the excerpt indicates, he and his analysts at DoD placed far too much strategic emphasis on one specific data metric: “body count” (i.e., enemy casualties). And they did this in spite of the fact that this very metric was opposed by most U.S. generals at that time.

So, contrary to what the authors assert, the core issue wasn’t that the data being used to manage the war effort was of poor quality. Instead, the central data issues were twofold. First. one influential manager (McNamara) skewed the chosen metrics to one that had little value in assessing US progress toward realizing its strategy. Second, the geopolitical strategy that this metric supported was screwed up in ways that could not be analyzed effectively with quantitative methods, then or now.

Instead, qualitative methods—of the sort employed by military analysts, political scientists and world historians—are far more useful if you’re, say, the president of the United States and aren’t sure whether to keep on deepening the country’s involvement in a foreign war or get out while the getting’s good. See my recent blog on quantitative vs. qualitative methods for additional context on this issue. Policymakers at that strategic level need informed gut feel to assess the right course of action.

Let’s get real. Data is neither a dictatorship nor an orthodoxy. It’s simply a resource and tool that can drive better decisions if we know its limits—or can exacerbate bad management strategies if we treat it like a holy scripture. There can only be a “dictatorship” of data if we accept the dictates of people who insist on skewed metrics in support of misbegotten strategies. And to prevent that from happening, we must always question any authority—in government, business, or our personal life—that demands we live our lives under a regime of performance numbers that are divorced from reality.

The bottom line is that there are no false data gods, just false data and faulty interpretations.

Even the so-called “single version of the truth” data that we load into our big-data clusters is not some holy scripture. Truth evolves and you can interpret the exact same “gospel” data in many different ways. Different data scientists might find different correlations within the same data set. Fresh data might invalidate the “truth”—in other words, the assumed causal relationships—that we had previously assumed. New aggregations involving formerly distinct data sources might open our eyes to patterns that we’d never even suspected before. New subject-matter experts might use advanced visualization to identify trends in existing data sets that their predecessors and peers had completely overlooked.

Putting your faith into data alone is foolish. It’s better to put your faith in the critical thinking skills and probing tools of data scientists. But you must continue to challenge their methodologies, techniques and findings. If you don’t, even the most humble data scientist in the world may start to believe that he or she walks on water.

Continue the discussion & check out these resources