Apply Big Data Analytics to Streaming Internet of Things Data
Nature of analytics: Gathering codfish sounds and other sensor data for deep insight
Care to take a wild guess on who is involved in a love affair with Atlantic codfish? It’s IBM® InfoSphere® Streams streaming data analytics software. What brings this couple together? As IBM big data expert Stewart Hanna explains in the recent video, “I Cod Help Falling in Love,” the attraction is in what the cod have to say.
When fish reproduce, they emit grunts, groans, and clicks at different frequencies. For example, haddock produce knocks while cod and tadpoles produce grunts. They also make distinct noises during courtship and when engaging in territorial battles and social aggregation. Biologists use sound data to study the fish. They need to track the reproductive habits of fish and marine life to understand the health of the water, wildlife, and surrounding ecosystems.
So what does marine biology have to do with analytics, the Internet of Things, or InfoSphere Streams? To track the sounds emitted by fish, which can be hard to detect and short-lived, scientists are using highly sophisticated hydrophones that emit a large volume of data rapidly in a unique format. This data from hydrophones makes a great use case for InfoSphere Streams. The software collects the sensitive audio data and analyzes it in real time while also correlating other data sources such as wind and temperature to help improve the environment and protect natural habitats.
Ongoing streams of machine data
The hydrophone is one example of an Internet-connected machine that continuously streams data. Today, only one percent of physical systems are connected to the Internet.1 But some industry analyses have estimated that by 2020 a highly significant amount of data will be machine generated, so now is the time to prepare. The connected car represents another example that is projected to be a rich source of machine data. Ford Motor Company researchers have begun experimenting with vehicles that can produce 250 GB of data in an hour.2 And the utilities industry, with its bevy of smart meters, usage trackers, geographic sensors, and other monitoring technologies, is also a growing source of machine-generated data.
Handling machine data involves several specific challenges. Formats vary and are complex, and few standards exist. A mix of streaming and at-rest data complicates the correlation and visualization of data sets. Machine data is also likely to be time sensitive, with a combination of data refresh rates. In addition, the data may or may not have context. But solutions such as InfoSphere Streams are designed to address this complexity. Performing analytics on machine data helps organizations answer questions such as the following:
- Does real-time visibility into business operations, such as customer experience and behavior, exist?
- Can all the machine data be analyzed and combined with existing security data to enable predicting and taking action on a security threat?
- Can end-to-end infrastructure such as wireless networks, smart grids, or manufacturing supply chains be monitored proactively to optimize costly resources and help deliver services when and where they are most needed?
In addition, many organizations are deploying analytics for machine data to enhance business processes. For example, IBM Burlington deploys InfoSphere Streams to develop real-time semiconductor manufacturing process control.3
Internet of Things–derived insight
An array of organizations across different industries uses InfoSphere Streams to build in core capabilities that enable them to derive deep insight from machine-generated data (see figure).
Core streaming analytics capabilities
|Cleansing, filtering, aggregation, and analytics for sensor and log data||||Verify veracity of data as it streams into the organization.|
|||Save and persist only pertinent data.|
|Ultra-low velocity runtime to support fast data||||Keep pace with the ongoing growth of sensor capabilities.|
|||React in real time to real-world-aware devices.|
|Support and analyze unstructured data||||Use toolkits and samples for rapid results.|
|||Take advantage of new insights from social media to enhance understanding of customers.|
|Powerful analytics to model real-world events from sensors and social media||||Reuse existing tools and models built with IBM SPSS® predictive analytics, SAS, R, MATLAB, Java, and C++.|
|||Predict future events from current sensor data with advanced cognitive and predictive tools.|
|||Gain insight into customers through social media to target prospects.|
|Industry solutions||||Deploy IBM Research–developed analytics accelerators.|
|||Apply InfoSphere Streams inside a wide variety of applications such as operational detection, intelligent transportation, and predictive insights.|
In addition to studying codfish, analytics capabilities in InfoSphere Streams can be applied to address other Internet of Things use cases. These capabilities include geospatial analytics, text analysis, sentiment analysis, and more.
- High-performance processing of geospatial data: Analysis of geospatial data requires complex mathematics such as set theory and geospatial geometry. Geospatial data is used for location intelligence and location-based services for security and surveillance, geographic information systems, traffic patterns, and more. For example, the Dublin, Ireland city council uses InfoSphere Streams to analyze 50 bus locations per second for its fleet of roughly 1,000 buses.4
- Unstructured text parsing to detect meaning and understand context: In the Internet of Things, text analytics can be used to derive meaning from vast amounts of text, including social data, to determine sentiment and identify illegal or suspicious activities. InfoSphere Streams is designed to complete unstructured text analysis faster than traditional methods and with enhanced precision. It can analyze millions of events/second in real time and help ensure data is trusted and secure.5
- Pattern and anomaly detection and prediction in real time: Knowing the order of events can have profound impacts on desired results. For example, predicting the path of a storm, forest fire, or other natural disaster or picking the next-best stock trade can save lives or financially benefit a client. InfoSphere Streams helps insurance companies plan for natural disasters and enables real-time public alerts. It also performs real-time analysis of sensor data. For example, it can be deployed to collect data from the Hudson River, which happens to be one of the most instrumented bodies of water in the world.6
- Real-time processing of phone call data: Telecommunications service providers continue to experience a huge growth in smartphone and mobile device use. Growing text and data usage creates a deluge of context- and time-sensitive data. InfoSphere Streams enables telecommunications providers to analyze billions of call data records per day to detect fraud, ensure high asset utilization, and create accurate profiles for heightened customer service and retention. For example, using InfoSphere Streams, Sprint reduced storage costs by 90 percent.7
Analysis and correlation of data variety
The massive amount of machine data available today—originating from IT machines, sensors, meters, and more—requires complex analysis and correlation across different types of data sets. Companies that can perform this analysis on a variety of data—and do so with speed and accuracy—can gain business efficiency, customer satisfaction, and strategic success.
Using InfoSphere Streams to analyze streaming machine data can empower a range of analysts, from those in business deriving insights and obtaining real-time visibility into the customer experience to scientists listening to fish to understand and enhance aquatic ecosystems. And what does a 1,500-pound white rhinoceros have in common with an organization’s customers? Find out in another upcoming installment featuring InfoSphere Streams.
Please share any thoughts or questions in the comments.
1 “The Role of Big Data and Data Security in M2M,” interview with Michael Curry, vice president, WebSphere Foundation, IBM Software Group, m2m Journal, July 2014.
2 “Ford Embracing Analytics and Big Data to Inform Eco-Conscious Decisions, Stay Green,” Ford Motor Company press release, October 2013.
3 IBM Burlington case study, June 2013.
4 “Dublin City Council,” bus system case study, August 2013.
5 InfoSphere Streams applications for implementing a set of complex event processing applications for analyzing log messages in a downloadable Streams Application Zip file, developerWorks, February 2012.
6 “Analyze This: Beacon Institute for Rivers and Estuaries,” Hudson River instrumentation case study, October 2011.
7 “Sprint leverages IBM big data and analytics to transform operations,” IBM client voices video, March 2013.
|||The Rise of the Machine Data: Are You Prepared?, IBM Software ebook, March 2014.|
|||Big Data at the Speed of Business, big data use cases, IBM.com.|
|||Big Data at the Speed of Business, big data in action, IBM.com.|
|||InfoSphere Streams Quick Start Edition, IBM.com.|
|||“The Rise of the Machine Data,” IBM Software ebook, March 2014.|
|[followbutton username='madiakc' count='false' lang='en' theme='light']|
|[followbutton username='IBMdatamag' count='false' lang='en' theme='light']|