Behind-the-scenes with IBM InfoSphere Streams v4.0

InfoSphere Streams Product Manager

The road to IBM InfoSphere Streams v4.0 began over a year and a half ago as it was realized that some key functions couldn’t be delivered in a short development cycle. So, the development team had some resources working on IBM InfoSphere Streams v3.2, and others working towards IBM InfoSphere Streams v4.0. The key hills for v4.0 included:

  • Business agility: A Microsoft Excel plugin for this popular BI tool to enable business users to glean business intelligence from streaming data
  • Application resiliency: Ability to specify portions of IBM InfoSphere Streams programs to ensure all records are processed at least once, yet retain the very high throughput for which IBM InfoSphere Streams is renowned
  • Simplicity for IT: New management console and automated systems recovery from machine failures to deliver even higher systems availability

Testing out these concepts throughout the design and development cycle included bringing in key customers for paper design reviews, as well as alpha testing in the spring of 2014. Based on customer feedback, several items were added to the plan to ensure greater usability. Then, last fall, closed and open beta programs provided expanded feedback on the nearly final code base.

During this period, IBM InfoSphere Streams emerged as a market leader in the stream computing category, as reported by Forrester last summer, and more recently by Evans Data Corporation (EDC). In its worldwide survey of big data and analytics developers, EDC asked: “Which stream processing runtimes are you using?” The answer? “IBM InfoSphere Streams came out well ahead of most every other major player, with 41 percent." IBM came out well ahead of both open source streaming and heritage complex event processing markets.

IBM InfoSphere Streams also lead in the blended world of open source and commercial software offerings. Last spring IBM introduced IBMStreams on github, where parts of IBM InfoSphere Streams have been donated to the open source community. New adapters such as Kafka and HBase were added to IBMStreams and are now part of IBM InfoSphere Streams v4.0. The new Distributed Process Store (dps) toolkit on IBMStreams added a few weeks ago delivers support for many open source NoSQL and Key Value stores such as Cloudant Cassandra, redis and MongoDB, among others.

IBM InfoSphere Streams v4.0 also added more open source software to enable running in a Hadoop environment. With Yarn support on IBMStreams and Zookeeper shipped with IBM InfoSphere Streams, a combined Hadoop and stream processing environment becomes simpler. By leveraging key open source capabilities, the focus remains delivering the premier stream computing platform with the broadest range of capabilities.

These capabilities allow IBM InfoSphere Streams to acquire all manners of data both at rest and in motion, rapidly analyze a large variety of analytics and act in real time. Extensions to the Geospatial Toolkit and the Time Series Toolkit help provide context-aware stream processing, enabling greater insights to make superior real-time decisions. Natural language processing capabilities provide further context awareness, that is, the ability to better understand something by taking into account the things around it, and a research asset known as G2 delivers additional context computing capabilities.

IBM InfoSphere Streams is a leader in the stream computing marketplace and the new capabilities discussed here continue to set it apart from all other offerings. I invite you to try it out on March 13 via Passport Advantage for existing customers, and via Streams Quick Start Edition for new users.

Try it to find out why everyone is trying to emulate IBM InfoSphere Streams and read more about the announcement in the updated solution brief, industry whitepaper and webpage.