Embracing real-time, streaming analytics in the insight economy
Part 1 of 2
Record-breaking temperatures in the summer of 2015 baked people to a crisp in many countries. But what’s even hotter than the broiling summer is streaming analytics. Open source streaming projects such as Apache Flink and Apache Spark Streaming are growing rapidly. Vendors including Amazon, Google and IBM are rapidly innovating in this space, and organizations are asking how to incorporate data streams into existing architectures for business intelligence, data management, visualization, warehousing and much more.
Not only are business environments sizzling with streaming analytics, but the technology is helping humanity to better understand and adjust its impact on the physical environment of this warming planet. For example, the University of Alberta is using IBM InfoSphere Streams to enhance understanding of the rain forests in South America, so the planet can be better equipped to support life through heat waves and other weather events.
This first part of a two-part series looks at the business drivers and technological innovations that are bringing streaming analytics into the mainstream in many industries and for many use cases. The concluding installment takes a deeper dive by dissecting the architectural issues involved when converging streaming data and batch analytics.
Why is streaming analytics burning up the popularity charts? After all, it’s not a new technology. Event processing and data streaming have been around for many years. Decades ago, the technology was called complex event processing (CEP), which progressed from within the financial industry where a microsecond can be an eternity. Algorithmic computing for trading required a technology that could bring logic to transactions happening very rapidly and make stakeholders lots of money. As a result, CEP systems—based on Boolean logic, if-then-else scenarios and specific states or conditions—were born.
Streaming analytics solutions introduced terms such as real time and instantaneous into the technology’s lexicon. Objectively, real time is the right time when customers or other key stakeholders expect to connect with an organization. Streaming analytics performance is also driven by the project. For example, processing HTTP traffic to spot a cybercriminal—something every industry is concerned about—demands ultralow latency and ultrahigh throughput and millions of events/second. In some cases, a few seconds is the goal, such as engaging with clients on social media. Regardless of how we measure real time, the key to streaming analytics is that it is separate and distinct from batch processing. Speeding up batch processes can help businesses but won’t help them capitalize on data streams. Batch techniques cannot be used on data streams; enhanced performance is needed.
The difference now is that data streams aren’t simply limited to high-end, extreme use cases any more, such as stock market trading. Data streams are mainstream, and the reasons why are several:
- Internet of Things sensors are everywhere. Today, we live on a planet with over 7.2 billion active SIM cards—more mobile devices than there are human beings. Gartner predicts that by 2020, 25 billion connected things will be in use.
- Social engagement is expanding. About 500 million tweets per day or 200 billion per year are generated. Given that the average life of a tweet is 18 minutes, time is of the essence to capitalize on sentiment or consumer behavior.
- A shift to a digital world is underway. By 2019, global IP traffic will reach 2.0 zettabytes per year, and at least 80 percent of that data will be in unstructured form.
Building streaming data analytics architectures
Innovators in the streaming analytics space recognize the need for both new architectures and capabilities to support streaming analytics, while incorporating open standards and community innovation. From an architectural perspective, streaming technologies need to accomplish several aims:
- Handling parallel streams of events continuously
- Moving from batch mode to in-stream mode
- Understanding and preserving content as a basis for action
- Expanding beyond rule- or condition-based logic to include sophisticated analytics such as machine learning, natural-language processing (NLP) and voice or image recognition
- Connecting to any data source on the planet through lightweight, fault-tolerant protocols such as Message Queuing Telemetry Transport (MQTT)
- Analyzing in-memory processes for high-speed applications
- Scaling up or down linearly on the fly based on memory requirements
- Bringing data processing closer to data sources
In short, architectures need to be able to facilitate multiple processes from a single message or trigger, and do so simultaneously. The second and concluding installment of this series examines in detail the requirements that are driving the evolution of streaming analytics architectures in today’s insight economy.
Learn more about IBM streaming analytics in an analyst report. Also, give the Quick Start edition a trial run. If performance is a top concern, take a look at this InfoSphere Streams benchmark study. And then experience the full power of the IBM advanced analytics portfolio, including InfoSphere Streams. Also, register today for Insight 2015, the premier forum for the insight economy.