SETI: Separating the signals from the noise

Marketing Technologist/Evangelist, IBM Emerging Technologies, IBM
There are trillions upon trillions of data points in the universe, and the SETI Institute is hard at work deciphering all of them. And what they learn about the cosmos may have real-world applications for industry.

SETI gathers millions of complex rows of relational data from the Allen Telescope Array (ATA) that are recorded, sifted, analyzed and visualized. Signals from the universe, captured through a telescope array, take form as a massive amount of data that isn’t wholly fathomable, even when we talk about terabytes. The ATA is pumping out over 60 gigabits of data per second, 24 hours a day. And there are Apache Spark queries running over 200 million rows of data in less than 3 minutes of wall-clock time. Radar, satellite and other radio signals have signatures with consistent characteristics that are picked up by the ATA. Consuming all of that data at scale, in real time, requires a superhuman effort to sort the signals from the noise.

Digital signal processing and cognitive computing are used to sift out satellite traffic, planes, echoes from space junk and other Earth-generated sounds. And the work is progressing more rapidly than ever, with Apache Spark distilling down to the data that matters. Cognitive computing allows for the processing of multilayered data arrays, from varied sources and time spectrums. The supervised learning allows us to insert classifications for the data. When unsupervised learning is applied, in the absence of our preconceived notions, it reveals outliers and forces us to ask questions we’d never before known to ask.

What else happens with that data?

The brightest minds in the world interpret it, their attention attracted to this unstructured (or “not-yet-structured”) information mother lode. Hackers, astronomers, physicists and data scientists all ponder at the never-before-clarified signals and about the possibilities of further developing what they’ve discovered. And the iPython Notebooks being developed in this project are useful for more than just radar signal collection projects. They can be used to decipher any collection of data points, including those collected from Internet of Things (IoT) devices. Always evolving, these sleek algorithms are designed for repeated, ever-improving and opportunistic use across other industry applications. Applied to data collected from factories, manufacturing plants, food facilities and biometrics, these algorithms will help separate the wheat from the chaff.

IBM jStart and the SETI Institute have made the computer-learning algorithms discovered in this “higher-purpose” deep-space project available to everyone. Packaged in an iPython Notebook, machine learning and Apache Spark are available together as an open source offering that can be applied to many different things across various industries.

In our quest to understand and explain the origin and nature of the life in the universe, we’ve gained insights that impact us every day. We’ve gained knowledge that could contribute to advancements in antibiotic-resistant superbug tracking, smarter cancer treatment plans based on real cohort success, improved manufacturing processes driven by demand and availability of resources (such as weather-reliant shipping routes) or better seismic activity predictions that take into account the noise/production of the oil and gas industry. The real-world applications are endless, and we find ourselves on the frontier of the next big technological revolution. We learn as we go, and we evolve. And what we learn may very well have relevance beyond the cosmos: we could hold in our hands the ability to transform entire industries and the way they capitalize on their data.

To learn more about IBM and SETI's collaborative research, follow @IBMjStart on Twitter. And for more insights into the aerospace industry and beyond, check out @IBMIndustry.