Visualizing big science like never before

Big Data Evangelist, IBM

Scientists are pushing the boundaries of their fields more aggressively now that they have big data at their disposal. It’s not just that they have more data from more sources flooding into their research efforts, but they can analyze, visualize and test more hypotheses more rapidly. They can also engage in audacious new experiments that were technologically never within reach of the legends of their disciplines. Today’s scientists can see further than ever before because they’re standing on big data’s broad shoulders.

What’s most fascinating is that scientists are increasingly using computational methods to analyze larger samples of the populations of interest, as well as deeper correlations within those populations than ever before. I have referred to this as the new paradigm of “whole-population analytics.” In that prior context, I noted that “as storage costs plummet and processing power becomes cheap and ubiquitous, you can do deep, continuous analysis against the entire population of data, rather than just the traditional capacity-constrained samples/subsets.” Or, as a researcher stated it in this recent article: “Suddenly, we don't have to be afraid of measuring lots and lots of things—about humans, about oceans, about the Universe—because we know we can be confident that we can collect that data and extract some knowledge from it."

What this points to is the rise of whole-population analytics as the new frontier in computational sciences. The article discusses several case studies that allude to ongoing efforts in scientific discovery that ride on having a much larger profile of the target population than ever before.

  • Whole-body analytics: The sheer complexity of bodily interactions,ranging from the genomic and molecular to the cellular, muscular, and other macro-level systems, is staggering. Profiling these interactions for even a single individual is a daunting task, and profiling how one person’s physiological health stacks up against an entire population of others is even more of a challenge. The article discusses a research effort underway in the UK to digitize detailed 3D video scans of the hearts of 1,600 living volunteers, while also correlating each person’s cardiovascular functioning against their genomic makeup. The goal of the study is to identify the complex and sometimes subtle links between cardiovascular disease and genetics so that preventive treatments can be developed and fit more precisely to each individual’s risk profile. One can easily foresee the day when such profiling is done routinely for entire populations, from birth and throughout people’s lives to assess the extent to which their genetic susceptibility to conditions is beginning to expose them to serious risk.
  • Whole-mind analytics: The body and mind interact in ways that are becoming far less mysterious thanks to ongoing advances in computational neuroscience and correlation of brain data against data from the fields of biology, physiology, psychology, and the cognitive sciences. The article discusses an initiative by researchers in the US who are storing detailed 3D brain scans from 30,000 individuals. Clearly, this data will prove pivotal in population studies of the risk factors, onset, and progression of neurological, psychological, and development issues afflicting the human mind.
  • Whole-ecosystem analytics: Individuals and societies cannot long survive without a clear understanding of their status within Earth’s interconnected ecosystems. Our continued survival may depend on scientific efforts to understand the biosphere in all its complexity and in its dependence on the physical world around us. Through such knowledge, we may better control our own adverse impacts on the ecosystem while fostering a more sustainable world in which biodiversity thrives. The article reports on a project in the UK in which researchers are digitally storing the entire genetic code of tens of thousands of different plants and animals. This data clearly has many scientific, commercial, pharmaceutical and other potential applications, and if it can be extended to a wider range of species, both living and extinct, it can help us better understand how precious and endangered life truly is.
  • Whole-cosmos analytics: Worlds are born, live and die all the time. This is a lifecycle that astronomers have observed for centuries and which is becoming even more apparent now that we have a growing view of exoplanets and the celestial forces that may imperil our own planet. In a universe with billions of galaxies and stars, and in a solar system teeming with asteroids and comets, we probably can’t track every last object that could potentially jeopardize our survival, but no one’s claiming there’s no benefit to having greater knowledge of what’s out there. The article discusses the Square Kilometre Array, a radio telescope being built in Africa and Australia. It is expected to annually collect an amount of 150 times the current total annual global internet traffic, and that will give humanity a teeny, tiny view into a cosmos that is unimaginably crowded.

Considering how mind-blowingly vast these populations are, it’s probably a bit hubristic for the article to refer to what we can make of it all as “approaching omniscience.” Fortunately, the author, Pallab Ghosh, brings it down to earth. He notes the considerable challenges of storing, organizing, labeling and otherwise managing scientific data of unprecedented scales. An ongoing challenge that future scientists will need to confront head-on is how to profile, cleanse, correlate, analyze and visualize it all.

Actually, scratch that thought. Humanity will never be able to truly get our collective heads around the “it all” of the world around us. But as long as we can analyze the entire population of whatever entities—customers, citizens, species, stars— we can model in our data, we’re doing all right.