Data Journalism: Big Data, Data Science, & the Art of Non-Fictional Storytelling

Big Data Evangelist, IBM

Data journalist? Something about that nouveau term feels a bit pretentious—and unnecessary.

journalist-notes.jpgEvery journalist is a “data journalist” of one sort or another, in the same way that every scientist is at heart a “data scientist” (see this blog for my take on the latter). After all, the core function of journalists and scientists—and analysts in general—is to get the data needed to tell a (non-fictional) story about something ostensibly happening in the real world (as opposed to the fictional process of “making stuff up”).

Every (competent) journalist must be an analyst at heart, gathering the facts and organizing them into the classic “who,” “what,” “when,” “where,” “why” and “how” that frame the story for quick comprehension. Every journalist must have enough analytical smarts to understand how to sequence the presentation by descending order of importance (i.e., the fabled “inverted pyramid”). Every journalist must vet, correlate and analyze one or more sources of factual data to make sure they’re telling an accurate story that’s not inadvertently biased to one point of view. And many journalists are in fact true analysts—aka columnists, commentators and pundits—who specialize in core focus areas, do in-depth research, become well-regarded subject-matter experts, and even know how to craft compelling visualizations of their topics.

Understood in this broader context, synonyms such as “data-driven journalism,” “evidence-based journalism,” and “analytic journalism” are not particularly enlightening. They don’t say what, if anything, is new in journalism that warrants a new job classification.

By the way, I have a master’s in journalism, take a keen interest in the historical development of mass-media institutions, and was for many years a freelance pundit in the business-technology press. So please understand these comments as an observation on how the practice of journalism is evolving in the current age, in which the Internet is the digital channel into which all media are converging.

So, the question becomes, what recent twist in the practice of journalism does the coinage “data journalist” refer to? From what I can see, a data journalist is someone who uses the new tools of big data, data science, statistical modeling and advanced visualization organically in their work. Data journalists may themselves be a type of data scientist, or simply be an analyst—a subject-matter expert—who leverages and presents the work performed by data scientists in the telling of non-fictional stories. In other words, this new breed uses big data’s “3 Vs” in the service of old-school journalism’s “5 Ws and an H.”

Typically, data journalists publish their work—stories, visuals, models and data—primarily or exclusively through online channels. Data journalists might allow the more analytically inclined readers to download the underlying models and data for further exploration. They might provide user-friendly infographics and simulations that illuminate the end-to-end scenario described in their stories and models. Or they might simply produce static outputs—reports, charts, graphs, trends, etc.—that readers can look at but not alter. If they produce hardcopy versions of their online stories that hide some of the underlying analytical complexity, the data journalist can present a compelling narrative for traditional journalism channels.

Clearly, most data journalists are not mainstream data scientists in the sense of specializing in the building of statistical models that drive business applications, such as customer-churn mitigation and loyalty marketing. But they’re joined at the hip with business data scientists in their core function: non-fictional storytelling. In fact, business-oriented data scientists must often articulate structured narratives—i.e., scenarios—that explain the patterns called out in their model-driven insights.

Whether you’re a data journalist or data scientist, visual storytelling becomes critical when the patterns you’re illuminating are largely invisible to the naked eye. A data journalist may be illustrating larger patterns in stories on social, environmental and economic issues, and a key data-driven visual can make all the difference in getting their point across. Indeed, statistical patterns often feel ghostly and unreal until we can actually see them, on some level, with our eyes and relate them to some narrative of who or what is interacting with what toward what result. Visual patterns serve the core narrative-building functions—framing the opportunity, problem and solution—in the data science and data journalism.

Depending on the complexity of the story the data journalist is telling, the key visuals might be as simple as a quick PowerPoint chart with graphical builds. Or the data journalist may generate dynamic visualizations from advanced analytic tools. Or, if they need to present a more complex narrative, data journalists might develop elegant infographics (such as this one) that combine these visuals, plus detailed in-context annotations and a dynamic process flow.

Compelling data-driven narratives are the raison d’etre of the data journalist. If I were to go to J-school these days, I would probably combine it with a double major in statistics. Young people going into the journalism field these days need to be quantitatively savvy and visually adept if they want to report the world’s events in their full analytical context.

Join us on April 30, when IBM will hold a live broadcast event, announcing new big data product offering in great detail, featuring actual customer stories.

Photo by Yan Arief under Creative Commons license