Lost in a sea of data? Let advanced analytics be your guide
Did you ever stop to think that the poet Samuel Taylor Coleridge foreshadowed the challenges of big data? Or perhaps I’m just reminded of it because the spring term has ended, and I happened to come across a copy of “The Rime of the Ancient Mariner” buried in a backpack. It tells the story of a man who is challenged by the sea and who doesn’t rise to the challenge. In fact, he kills that which helps him. The poem treats plenty of moral themes, but I’m going to ignore those to focus on how this famous tale of man versus nature relates to big data.
We are surrounded by data. We are drowning in data. And yet we just can’t seem to get data to work for us. We compromise by taking a sample or by calling things “good enough” when they clearly aren’t—or we hire the expert “albatross” to lead us in the right direction and then are frustrated when that albatross can’t seem to get us what we want, either.
Data at rest versus data in motion
Data, data everywhere? I’m certainly not the first to equate data with water. The data lake is no new concept, and it’s a great metaphor. But aside from data’s size, we also need to think about data’s motion. A lake may seem calm to me, yet Coleridge’s sea is constantly in motion. And data is the same: some at rest, some in motion.
The concept of data at rest is an obvious one, bringing to mind big enterprise stores of data. Having been built up, they now continue to grow exponentially in both size and complexity.
But the idea of data in motion may not seem as obvious. If it will eventually come to a rest, why worry about it? But consider that instantaneous responses are required for stock trading, national security, disease detection and so much more—and that a fast response without power analytics to back it up is worthless. To address the challenge, continuously analyze data streams. That means you need the ability to apply analytics to data in motion before storing those data, even for data that will never be stored. Because by the time it comes to a rest, it may have changed multiple times.
Consider the problem of trying to signal maintenance crews in real time that a pump is just about to break down. Think of the repair headaches that doing so would avoid, not to mention the customer service issues it would forestall. As someone who came home with a newborn to a house that had no running water—thanks to a water main that broke and that stayed broken for three days—I certainly would have benefited from a more proactive approach to water pumps.
It all begins with the data. The water utility has years and years of history of water use from millions of pumps, and within that history lie indicators of failure. This rich source of data provides the raw material that our analyst can use to build a model. In this case, the model will predict whether a pump is in danger of failing. It can even predict the severity of the failure.
Analysis can use programming languages such as R and Python but doesn’t have to. An analyst can point and click the way to a model that has validation, visualization, partitioning, sampling, balancing—anything needed to be an effective data miner—already built in. This allows an analyst to focus on analysis and on how to interpret findings instead of scouring code for snippets and samples.
After a model is created, it can be applied to data at rest to find any pumps that are at risk. It can also be applied to data being generated by the pumps in real time—that is, to data in motion. That information can be provided to field crews in conjunction with other data, such as whether a pump is near a hospital or a school or whether a scared new mom just came home from the hospital desperately needing a shower.
A powerful platform for big data and analytics
Solving this kind of water problem requires not only a powerful platform for the data itself, but also powerful tools for analytics. Bringing together the user-driven data mining capabilities of IBM SPSS Modeler, the enterprise-grade big data power of IBM BigInsights and the context-aware stream computing of IBM InfoSphere Streams can finally remove the barriers preventing organizations from realizing the full benefit of their sea of data.
With the help of IBM solutions, organizations can do any of the following:
- Create powerful models without hiring a team of programmers and developers.
- Use every bit of historical data to build, (re)build and validate models without sacrificing performance.
- Score streaming data and provide results in real time to applications, people and systems to make decisions.
To learn more, join us on June 17, 2015, at 1 p.m. ET for “Advanced analytics at scale: Because business outcomes matter.” In this 30-minute webinar, our experts will discuss and demonstrate how IBM advanced analytics at scale can help you navigate the sea of data and guide you quickly and efficiently toward more in-depth analytic discoveries and better business outcomes. Register today.