Data visualization playbook: The importance of excluding unnecessary details

As the big data revolution gathers momentum, data scientists are working with larger data sets than ever before—a trend that shows no sign of abating. But with ever larger data sets comes the temptation to include ever more information, representing the data in all its glorious detail. Who, after all, can resist the temptation to flex some intellectual muscles by mastering truly complex data?

But as tempting as visualizing every last detail can be, doing so can erect barriers to understanding. A visualization that includes unnecessary information can overwhelm readers, obscuring the message and leaving its audience confused. Let’s explore a real-world scenario, stepping through the thought process that goes into designing an effective data visualization.

Break down the data

A foundation set up to fund environmental projects published an overview of the funding it provided during 2013. In its original form, shown in Figure 1, the report included a pie chart showing the distribution of grants across a range of environmental issues.
Figure 1: The share of funding distributed to 17 environmental issues during 2013.

Evaluate your visualization’s usability

A first glance at the pie chart reveals nothing wildly amiss. The chart represents the data simply and directly, breaking down the distribution of funding in its legend. But a closer look reveals certain flaws that can impede understanding:

  • Excluded information
    The chart supplies an exact figure for only 7 of the 17 issues—specifically, only for issues that received at least 7 percent of the overall funding.
  • Cumbersome design
    The many slices in the pie chart distract from the larger facts, requiring readers to match the color of each slice with a color in the legend to identify the issue described.
  • Confusing presentation
    The choice of colors does little to differentiate issue areas—for example, a reader could easily mistake “Air Quality” for “Rivers and Lakes” or fail to differentiate “Populations” from “Wildlife Biodiversity.”

Figure 2: The level of funding distributed to each environmental issue during 2013, both in dollars and as a percentage of total funding.

Find the forest in the trees

The organization intended the visualization to provide an overview of issue areas funded during 2013. To boost the overview’s effectiveness, the designer grouped environmental issues into five categories, as depicted in Figure 2: “Environmental Policy,” “Climate and Energy,” “Natural Resources,” “Preservation and Biodiversity” and “Sustainable Development.” The designer then redesigned the pie chart around the new categories, grouping slices as shown in Figure 3.

Figure 3: The share of funding distributed to each issue during 2013, with individual issues delineated but grouped into colored categories.

Provide a quantitative overview

But the visualization still contained unnecessary information. The designer streamlined the legend as shown in Figure 4a, emphasizing the categories and dispensing with a complete list of issues. To further emphasize the categories, the designer removed the lines demarcating individual issues and supplied the percentage of funding distributed to each category, as shown in Figure 4b. By categorizing and unifying individual issues, the new visualization provided an effective overview featuring quantitative information.

Figure 4a: The share of funding distributed to each category during 2013, with individual issues delineated but grouped into colored categories.

Figure 4b: The percentage of funding distributed to each category during 2013.

Create new levels of insight

After streamlining the pie chart, the designer introduced a new level of analysis, segmenting the data by global region and creating pie charts to show the worldwide distribution of grants. To obviate the need for another legend, the designer superimposed the pie charts on a world map, as shown in Figure 5.

Figure 5: The regional share of funding distributed to each category within each global region during 2013.

Design for your audience

Before you create a data visualization, tailor your message to your audience. Don’t overwhelm your audience with data, but also take care not to render the data useless through oversimplification. You’ll want to create one kind of visualization when presenting to experts in the field, for example, but another when giving a high-level overview to a general audience. To learn more, discover how the IBM advanced analytics portfolio can help you find patterns in and derive insights from your data through visual exploration.