Big data on wheels

Manager of Portfolio Strategy, IBM

Pop quiz! How much data do car sensors produce every hour?

Answer: About 1.3 gigabytes. Using a little math, we can extrapolate the amount of sensor data created by cars. Industry experts report 60 million cars are manufactured each year. If we assume cars are driven four hours per day, that’s 312 million gigabytes or 108 exabytes yearly.

The bottom line: data automatically generated by cars provides a fantastic “fuel source” for big data and analytics. In fact, the McKinsey Global Institute estimates that the automotive industry will be the second largest generator of data by 2015. This estimate is not surprising, since some plug-in hybrid vehicles generate as much 25 GB of data in just one hour. Fun fact: the McKinsey Global Institute estimates that the leader in machine data is the utilities industry, with its bevy of smart meters, usage trackers, geographic sensors and other monitoring technologies.

Datagram_2-12-14_1.pngThe impetus to harness this data is very strong. On January 10, 2014, Bloomberg news reported that “The connected car is becoming the hottest model on dealer lots. In-vehicle technology is the top selling point today for 39 percent of car buyers, more than twice the 14 percent who say their first consideration is traditional performance measures such as power and speed.” The net is new car buyers are asking about how they can interact with smartphones and ipads before they inquire about horsepower.

Gartner reports that, by 2018, one in five cars on the road will be “self-aware”: able to discern and share information on their mechanical health, their global position and status of their surroundings. A system of sensors, vehicle-to-vehicle communications and computing power will lead to intelligent cars that interact with their owners.

Existing technologies such as HTTP have some serious limitations around quality of service and speed. HTTP wasn’t designed for wireless communications in the Internet of Things. The latency of communications via HTTP is just too slow to make it viable for real- time connected car communications and analytics. 

These limitations gave rise to MQTT - Message Queue Telemetry Transport. MQTT is a publish/subscribe, extremely simple and lightweight messaging protocol, designed for constrained devices and low-bandwidth, high-latency or unreliable networks. The design principles are to minimize network bandwidth and device resource requirements while also attempting to ensure reliability and some degree of assurance of delivery.

To harness data generated by connected cars, analytic applications must leverage MQTT as the messaging protocol. Getting this right can mean the difference between safe driving and accidents. connected cars need to:

  • Enable the right interactions—avoid alert fatigue
  • Ensure low latency responses—delays usually cause distractions
  • Get the controls right—voice, gesture, haptic

Analytic results must deliver insight in milliseconds and incorporate sophisticated  analysis such as geospatial, correlation, filtering and time series. For example, trucking companies use connected vehicles to assess the health of drivers and understand weather and road conditions. This data can inform their response to minimize insurance charges; they may choose not to deploy a truck under harsh road conditions since insurance premiums are higher during inclement weather. They also use machine data to improve driver safety and vehicle-to-vehicle communication. Wouldn’t it be useful if the windshield wipers automatically turned on when raindrops fell or if an emergency vehicle was automatically routed to the scene of an accident and advised of its severity level?

One of the flagship IBM big data offerings, IBM InfoSphere Streams, supports MQTT and is used by many in a broad ecosystem such as car manufacturers, retailers, insurance companies, trucking companies and consumers to be safer and be more productive on the road. Real-time analytics are used both during and after the manufacturing process to achieve exceptional outcomes, including:

  • Profitable aftermarket services and products
  • Improved interactive driving experience and safety by real-time analysis of weather-based data or road-congestion alerts
  • Integrated vehicle data available to third parties such as insurance companies, retailers and emergency medical services
  • Improved quality and functionality of future products
  • Optimization of the global value chain to improve the environment

The massive amount of car-generated-data available today—originating from machines, sensors and more—requires complex analysis and correlation across different types of data sets. Organizations that can perform this analysis across a variety of data (and do so with speed and accuracy) stand to reap rewards in business efficiency, customer satisfaction and strategic success. The IBM approach to big data takes all of this into account to help you gain business insights and real-time visibility into the customer experience.

Listen to my podcast on "Fueling the Connected Car."

Links for more info