Geospatial Analytics: Evolving to Address Mindblowing Big-Data Challenges

Big Data Evangelist, IBM

The unfathomable scale of the universe we inhabit strains everybody’s powers of imagination. Trying to analyze the forces that shape it all, across all scales, is probably the greatest scientific challenge that humanity will ever face.

Geospatial analytics is usually focused on the nearest nook of that cosmic tableau. It stresses the “geo” because that terrain—sweet planet Earth—is the focus of our lives. So we map the face of this fair orb in increasing detail, and even extend our cartography to the interiors of buildings and to the depths of the oceans. These are the contours of the world we inhabit and upon which our survival depends.

galaxy-2.jpgBut mapping the “spatial” of space—both outer and inner—is growing in importance to the human race. Though our survival may not depend on having a precise map of all the exoplanets in the Crab Nebula, we certainly need to track moving objects that are closer to home. As the scientific community and Hollywood have spared no expense to remind us, asteroids and comets can send us all to extinction. On a smaller scale, meteorites can streak in out of the clear blue to wreak considerable damage on the ground. And as the hit movie “Gravity” illustrates, humanity’s safe colonization of earth orbit depends on tracking the fast-moving, dangerous debris that our species has inadvertently dumped overhead. Conceivably, 3-D geospatial predictive analytics can help us move our satellites, shuttles, space stations and astronauts out of harm’s way. Even the typical bullet-speed micrometeorite can rip clear through the outer shells of our orbiting assets.

Obviously, the big-data resources needed to map a sprawling universe are themselves astronomical. This recent article detailed the data collection, storage and processing requirements of leading-edge sky surveys, past, present and predicted. For example, the largest sky survey to data, the Sloan Digital Sky Survey, produces 100TB of stored data per year. However, the Large Synoptic Survey Telescope (LSST), an optical survey telescope to be built in Chile in the coming decade, is expected to store 10PB of fresh observational data per year. Coming soon thereafter, the Square Kilometer Array, a radio telescope with antenna arrays in South Africa and Australia, will pull in a whopping 365 PB per year of new data.

Though none of these projects is specifically tasked with tracking moving objects in space, they will provide a critical early warning system for detecting asteroid-scale threats. As regards space junk, it’s probably asking too much for ground-based assets to map the moving location of every nut and bolt floating on high.

Or is it? One thing that jumped out at me from the cited article is this passage, referring to the forthcoming LSST:

“Tony Tyson, an experimental cosmologist now at the University of California, Davis, ....envisioned a telescope project on a truly grand scale, one that could survey hundreds of attributes of billions of cosmological objects as they changed over time. It would record, Tyson said, ‘a digital, color movie of the universe.’ Tyson’s vision has come to life as the LSST project, a joint endeavor of more than 40 research institutions and national laboratories that has been ranked by the National Academy of Sciences as its top priority for the next ground-based astronomical facility.”

Wow! But, rather than exaggerate their cosmos mapping powers, astronomers pointed out that even the technical resources that will be available to the LSST are puny compared to the more critical scaling challenge its scientists face. Per the article:

“In combining repeat exposures of the same cosmological objects and logging hundreds rather than a handful of attributes of each one, the LSST will have a whole new set of problems to solve. ‘It’s the complexity of the LSST data that’s a challenge,’ Tyson said. ‘You’re swimming around in this 500-dimensional space.’ From color to shape, roughly 500 attributes will be recorded for every one of the 20 billion objects surveyed, and each attribute is treated as a separate dimension in the database. Merely cataloguing these attributes consistently from one exposure of a patch of the sky to the next poses a huge challenge. ’In one exposure, the scene might be clear enough that you could resolve two different galaxies in the same spot, but in another one, they might be blurred together,’ [according to Jeff Kantor, the LSST data management project manager]. ‘You have to figure out if it’s one galaxy or two or N.’“

Note that this discussion is still focused on mapping relatively static celestial objects—such as galaxies—rather than dynamic ones, such as comets streaking toward our solar system. Imagine the order-of-magnitude resource requirements—storage, processing, memory, bandwidth—that would be needed for a cosmic “radar” that tracks threats to our existence from all angles of attack. Yikes!

Though we’re in no immediate danger of rogue subatomic particles ending life on Earth (other than those in human-engineered chain reactions), the article note that the same scaling challenges confront data-driven scientists mapping dynamic phenomena on this smallest of all possible scales. If anything, the probabilistic complexities of mapping the subatomic create an even greater reliance on the multidimensional graph analytics upon which the LSST’s celestial survey depends. Interactions among invisible probability waves are the very fabric of the subatomic, in stark contrast to the more visible macro interactions (i.e, galactic superclusters, solar flares, comet trajectories) that astronomy telescopes track. Already, the Large Hadron Collider collects, stores and processes 25PB of data per year, with the amount sure to grow as it tackles fresh challenges.

But as the article makes clear, the practical distinctions between astronomy and particle physics investigations are increasingly blurring. Mapping creation at any and all mind-boggling scales—from cosmic to subatomic—requires analytic approaches that can “see” phenomena that are otherwise invisible:

“The LSST tests the limits of scientists’ data-handling abilities. It will be capable of tracking the effects of dark energy, which is thought to make up a whopping 68 percent of the total contents of the universe, and mapping the distribution of dark matter, an invisible substance that accounts for an additional 27 percent. And the telescope will cast such a wide and deep net that scientists say it is bound to snag unforeseen objects and phenomena too. But many of the tools for disentangling them from the rest of the data don’t yet exist.”

In our lifetime, I expect that Nobel Prizes will go to those computational-physics researchers who craft the algorithms to peel away the darkness in our data sets. Indeed, many of the best data scientists in the big-data arena have backgrounds in physics and astronomy. Many of these professionals are already working geospatial analytics projects in the commercial arena and in applied sciences. Over time, we will undoubtedly see them incorporate the tangled celestial and subatomic dynamics of earthly concerns (e.g., solar flares’ impact on radio communications) into their geospatial mapping projects.

Should we call this new frontier “spatio-spatial” mapping?