New hyper-fast data ingestion enables smarter decisions
Human beings tend to filter out events they deem unimportant. They can only process so much at any given time.
Computer systems, however, must be able to handle a massive number of events in real time or near-real time to help support a wide range of applications. Financial applications must monitor events to help counter fraud. Retail apps use online shopping events to capitalize on growth opportunities. Manufacturing production lines and energy systems use event data from Internet of Things (IoT) devices for automating faster, smarter responses to changes that demand attention. These are just a few examples where event processing can potentially be applied with great effect.
A different kind of data store
With the the high velocity, volume and variety of data that events can generate, an event data store must be able to deliver:
- Fast data ingest rates (inserts)
- In-memory indexing for fast and efficient lookups
- Near real-time analytics on all ingested data with online analytical processing (OLAP)
- Integrated machine learning capabilities to “learn” from previous events
- High availability and replication to provide continuous value to the business
- Linear scalability by simply adding nodes
- Support for open storage such as Apache Parquet to minimize vendor lock-in
- Support for hybrid cloud configurations to match with appropriate workloads and service-level agreements
Three key characteristics are necessary to deliver an effective process and data store:
An event store architecture.
Car sensors, home appliances, credit card purchases, mobile transactions and aircraft flight systems generate volumes of events, many at high velocity. This necessitates the efficient ingestion of vast amounts of data, on the order of a million inserts per second1, through numerous messaging systems such as Kafka, Spark, IBM Streams, and other vendors’ streaming solutions. Support for industry standards and open APIs such as Scala and Python are necessary to make use of existing and available skill sets and democratize event streaming and processing.
To access the latest ungroomed data, queries must be able to directly access the optimized event nodes and their associated cached data in the cluster. If minutes-old data is acceptable, then users should be able to access and query that stored hardened data using “vanilla” spark nodes and use compatible analytics tools of their choice.
Ultimately, queries should be able to retrieve the most recent data and combine it with groomed data in cache or in the storage layer.
High availability for all data, event store processes and stores is vital. Technology often fails when users least need it to. Data, as well as all associated log data, must be sharable through replication across nodes for redundancy reasons. Should a node-failure occur, queries must continue to be processed, so the configured number of query replicas must always be reachable as depicted below.
High availability in an IBM Db2 Event Store cluster.
Any event-streaming processing solution should provide sophisticated management and monitoring capabilities to help provide insight into the health of the system.
Event-driven AI with machine learning
Consider the impact of AI, machine learning in particular, on event processing.
Machine learning is best facilitated through the availability of large quantities of data. A high-speed event store can capture, analyze and store more than 250 billion events per day. This can enable applying machine learning to the most recent data along with the historical data.
Each time an event occurs, a system could “learn,” process and react rapidly to events as they happen, subject to the processing capabilities of the systems. Event processing coupled with machine learning helps enable applications to become “aware” of what is happening, to the point it could potentially help predict when similar events might recur. They thus help protect against potential impending fraud or disaster by correlating previous events and outcomes.
There are many scenarios where this linkage between event processing and machine learning can be used, faster and more often than human beings can ever process. Having intelligent and “aware” event-driven systems augmenting one’s own capabilities is like having a personal, trusted adviser or assistant.
How to get started
These combined capabilities add up to a tall order for an event streaming and processing solution. Fortunately, IBM has infused its event processing capabilities with advanced machine learning technologies to deliver the IBM Db2 Event Store, which is available in developer edition and enterprise edition.
1. Based on an internal 5 hour test February, 2018, producing over 3 million inserts/second across three nodes.
The test was performed on three servers, each with:
CPU: Quad Intel Xeon E7-4850 v2 (48 Physical Cores, clock speed 2.30 GHz)
Memory: 256 GB RAM
Network adapter: 10 Gbps
Compute: Five SSD drives: 1.7 TB SSD (3 DWPD)
Storage: Five HDD drives: 8.00 TB SATA
OS: RHEL 7.4