The quick reference guide to technologies and applications for stream computing

Manager of Portfolio Strategy, IBM

The next frontier in data exploration is taking center stage as open-source and vendor-backed solutions increasingly run the gamut of architecture and design—as well as of the problems they seek to solve. Indeed, this plethora of approaches can help make data streams available to data science professionals. Data streams are ubiquitous and incessant, and modern technology is becoming increasingly adept at capturing and making use of them.

Data streams are of interest to data science professionals, who aim to build flexible data products that tap into all data—including data streams. Using data streams, data scientists can iterate quickly, scoring and updating algorithms based on data in motion and then infusing the insights they gain across the enterprise in various applications.

Applications for data streams are diverse, and the products being built around them are even more so:

  • Connected cars can use geospatial analytics to evaluate road conditions, accidents and disaster zones in microseconds as they navigate a city, with alert events delivered to each car’s dashboard.
  • Next-generation intensive care units can monitor devices’ data streams for life-threatening conditions, allowing healthcare providers, who are continually bombarded with data, to respond with timely treatment by making sense of that data before they miss an opportunity to enhance care.
  • The RedRock app, powered by IBM Analytics running on Spark, finds patterns in tweets to identify influential individuals and related topics of interest, as well as the geographic regions in which surrounding conversations take place. In the hands of a marketer, RedRock could become a powerful tool for connecting with a target demographic or identifying emerging markets. In the hands of someone at the increasingly overwhelming SXSW, it could help filter weather, private and unannounced corporate events, surprise artists, popup studios and even food.

Navigating stream computing can be confusing to the uninitiated, but here are some important concepts to keep in mind:

  • Certain technologies are used to move data around the enterprise, including HTTP, MQTT and RSS. Rather than applying business rules or analytics, stream computing captures data and moves it to a repository—generally HDFS- or DB2-based—for downstream processing.
  • Complex event processing (CEP) got its start in the finance industry as part of algorithmic trading, as part of which machines must make trading decisions based on established rules. In stream computing, Tibco, Oracle Complex Event Processing, IBM Decision Manager and other such solutions parse data streams and take action based on specific Boolean rules or if-then-else logic.
  • To enhance complex event processing, new solutions such as Hortonworks DataFlow and IBM InfoSphere DataStage curate data flows for processing in Hadoop or Spark, acting as data integration solutions for data streams. Such solutions trace, parse, filter, join, transform, fork or clone data streams to boost levels of confidence in streamed data. As a result, data streams can be added to holistic context and persistent data to help drive quick and informed decisions.
  • Streaming analytics solutions apply programmatic rigor to data streams to bring to bear sophisticated analytics rather than mere business rules—as exemplified by solutions such as Akka, IBM Streams and Twitter Heron, which aim to apply machine learning algorithms to data streams. IBM Streams, for example, offers toolkits for cutting-edge applications such as acoustic analysis for call center optimization or spatial temporal analysis to monitor emergency vehicles in transit.

For an overview, take a look at how these different classes work together:

Though data streams are complex, using them doesn’t have to be. To get started with stream computing and infuse data streams into your applications, try the IBM Quick Start Edition.