Spark delivering value in the real world here and now

Big Data Evangelist, IBM is a young market, and commercial adoption of Spark remains limited. But Spark open source technology has been available for several years, and IBM has been working with customers throughout the world on projects incorporating Spark.

Here are a few examples of customers who derive tangible business value today from the results of Spark-based initiatives:

  • Improvements in public transportation: Real-time transportation planning software from Optibus is changing how public transport is organized. “Spark, together with IBM, provides a highly scalable platform for Optibus, making it easy for us to expand our software as a service offering into new markets, and helps us simplify deployment, maintenance and application development for transportation companies worldwide," said Amos Haggiag, Cofounder and CTO of Optibus.
  • Optimized analytics for the Internet of Things: Findability Sciences, a global consulting and contextual data technology solutions company, is using IBM Analytics and Spark to help clients tap into the power of big data. “Apache Spark with IBM BigInsights has given us tremendous capacity for our implementations for small and medium businesses, where MapReduce was not efficient. With Spark, the performance has improved multifold. We’re now able to process streaming data from IoT devices and offer analytics for data in motion for things like traffic, commuters and parking,” said Anand Mahurkar, CEO of Findability Sciences.
  • Accelerated claims processing: Independence Blue Cross (IBC) is the largest health insurer in the Philadelphia area, serving more than 2 million people in the region and 7 million nationwide. IBC is using Spark to help drive product innovation and develop new services. “Apache Spark is quickly maturing into a power tool for development of machine-learning analytic applications. It allows our IBC researchers and academic partners to work together more seamlessly, which means we can get new claims and benefits apps up and out to customers much faster,” said Darwin Leung, Director of Informatics at IBC.
  • Exploring the universe: IBM, NASA and the SETI Institute are collaborating to analyze terabytes of complex deep space radio signals, using Spark’s machine learning capabilities in a hunt for patterns that might betray the presence of intelligent extraterrestrial life. “With Spark as a Service on Bluemix, we’ll be able to work with IBM on a global scale to explore new ways to analyze signal data and build on each other’s innovations,” said Dr. Seth Shostak, Senior Astronomer and Director of the Center for SETI Research.
  • Exploring the human genome: A team used Spark to build a powerful SQL/R/Scala development environment for data scientists to use in analyzing genomic data from the web and other sources. They provided a machine learning wizard for scientists to quickly dig into chromosome data, classifying genomes by population. This autoscalable cloud system has increased processing speed and reduced the time required for analysis of massive genome data, putting the power in the hands of the people who know the data best.
  • More efficient real-time traffic planning: A team built an Internet of Things (IoT) application for urban traffic planning, providing real-time analytics using spatial and cellular data. Traditional messaging queues could not handle the massive and continuous data inputs, and traditional data lakes could not handle the large volume of cellular signaling data in real time. But Spark could. The team exploited Spark as the engine of the computing pool, using Oozie to build the controller module and relying on Kafka as the messaging module. The result? An application that processes massive cellular signal data and visualizes the resulting analytics in real time.
  • Better light rail forecasting: Light rail is favored for public transit by more and more passengers in Chongqing, China, having become one of the main methods of transport for Chongqing’s population of 2 million. The main rail network covers the main district of Chongqing, and daily passenger traffic can reach 2.4 million passenger rides per day. A Spark-based solution developed by Chongqing JieShang Metro Tech, a subsidiary of Chongqing Rail Transit Group, in collaboration with IBM, will use historical data from the past 10 years, legally collected by the automated fare collection system, to forecast passenger volumes, boosting predictive accuracy near 90 percent using the generalized regression neural network (GRNN) model. Using accurate predictions of passenger volume, the company can optimize train timetables to identify optimal departure intervals for trains throughout the day. Moreover, by analyzing estimated passenger volume in light of train timetables, the solution can help passengers plan their trips, scheduling passenger departures to prevent overconcentration of flow while helping passengers reach their destinations as soon as possible.

And these are part of what is only the first wave of Spark-fueled creativity. A month ago, IBM established its Hack Spark Challenge. A groundswell of innovation gathered force in only three weeks when IBM invited 10,000 Spark developers, all of them IBMers, to participate—and saw 28,000 show up. They weren’t all developers or data scientists or data engineers, but many were. And many others were specialists in other fields—people who had begun to use Spark and who were interested in seeing what creative, analytic-driven outcomes might come out of IBM’s challenge.

No field or subject of development was prescribed for participants in the Hack Spark Challenge. Instead, teams formed spontaneously around shared interest areas. After 10 days, IBM received more than 100 submissions, most of them not only creative but sophisticated. Among them were Spark-based projects achieving outcomes not previously imagined.

The increasing range of practical applications for Spark the world over shows the power of even a dash of Spark. No matter where it’s used, Spark ignites a fusillade of data-driven ingenuity.

You can learn more about IBM’s deep commitment to Spark, including sparking a new world of data science wizardry, by visiting the following online resources: