Leveraging event-driven systems for IoT with high-speed data ingestion
It seems that we’re reaching the point where the Internet of Things (IoT) is moving from the domain of enthusiastic early-adopters to the more challenging, more profitable territory of mainstream enterprise technology.
In particular, companies in industries that depend on the attention and loyalty of large numbers of consumers – retail and air travel, for example –have already woken up to the possibilities. If you can recognize an individual customer’s mobile device as they move through an airport or walk past your store, you can potentially take action to improve that customer’s experience – whether that means making a personalized offer to attract their attention, or simply offering a warmer welcome in the departure lounge when you know they’ve spent a long time waiting in line at security.
Event-driven architectures are playing a key role in these types of applications. To learn more about how the event-driven approach works, and why it has special benefits for these use-cases, I spoke to Anson Kokkat, one of IBM’s leading experts on data modeling and data architecture design.
Anson Kokkat is a Product Manager at IBM, who works on the development of database modeling and tools. He started his career at IBM as a software engineer, leading a number of major database integration projects and supporting the development of the DB2 database platform. Today, he focuses on project management, sales enablement and customer advocacy, and presents exciting new IBM solutions to hundreds of clients worldwide.
Cindy Russell: Anson, in your experience, where are clients in their adoption cycle for these types of applications?
Anson Kokkat: We are already seeing companies putting these kinds of solutions into production – they’ve moved from neat data science projects and experimental prototypes into full-scale production.
That’s impressive because delivering these kinds of solutions is still a major technical achievement. Surprisingly, building an extensive network of IoT-connected sensors isn’t the main problem – the real challenge is in building a centralized architecture that is capable of ingesting and analyzing the vast quantities of data that these sensors produce. This is huge because if you can’t process the data fast enough, the insights may come too late to be useful.
IoT solutions aren’t the only type of system that faces this problem – it applies to any use-case where you have a huge amount of data streaming in, and you need to react to it in real time.
Another example might be a railroad that has acoustic sensors to detect the state of rails and bearings and help to prevent derailments: success depends on capturing and analyzing potentially millions of transactions per second to detect problems before they occur.
Or an online retailer might want its website to deliver a responsive customer experience by comparing an individual user’s previous shopping history with their current behavior as they browse the site – requiring near-instant analysis of every mouse-click.
Event-driven architectures have proven to be one of the best ways to solve the challenges of simultaneous high-volume data ingestion and high-speed analytics.
“The real challenge is in building a centralized architecture that is capable of ingesting and analyzing the vast quantities of data that IoT-connected sensors produce. If you can’t process the data fast enough, the insights may come too late to be useful.” -Anson Kokkat
Cindy Russell: Can you describe what differentiates an event-driven architecture from a more traditional monolithic application platform?
Anson Kokkat: With a traditional approach, you would generally have an application platform connected to a database. The application would request create, read, update and delete (CRUD) operations, and the database would handle those requests and be responsible for maintaining the state of the application.
When you have a very large amount of data coming in, the problem is that there’s no separation of concerns. The same database engine is handling both reads and writes, which puts a brake on both types of operations. Access to the dataset must be synchronous – you can’t read and write the same data simultaneously, because the application expects the database to provide transactional integrity and consistency. As a result, when you need to scale up to ingest millions of transactions per second, it’s very difficult for the database to keep up with the demand.
In a traditional transaction database, you do not separate read and write activities because you need a single, unimpeachable version of the truth. Event-driven architectures change this paradigm by dividing write and read operations into separate microservices, which work independently. The database in event-driven use cases acts more like a log, capturing events as they come in from the “write” services in a staging area, and transforming them into an appropriate format for ongoing storage and analysis.
The “read” services in an event-driven paradigm are used to query and analyze the data. Wherever possible, they draw data directly from storage, so they don’t interfere with the “write” services or slow down the ingestion. Only queries that genuinely need access to the hottest new incoming data are routed to the staging area, so there’s a clear separation of concerns between real-time analytics on incoming data, and less time-sensitive retrospective analysis of historical data.
“Project EventStore gives you everything you need to build a back-end for an event-driven architecture in a single, well-defined package. Because it has been built, tested and optimized by experts, there’s much less to worry about in terms of maintenance and performance, versus a solution you’ve put together yourself.” -Anson Kokkat
Cindy Russell: What technologies are organizations currently using to build these architectures?
Anson Kokkat: Typically, you have a stream processing engine on the front end, which is sending in the data from your sensors or websites, or wherever the data is coming from. To provide the “write” services, you need a highly scalable database platform, typically running in a clustered configuration to provide fault-tolerance and linear scalability. Apache Cassandra would be a good example of the kind of technology that we’re seeing in use at the moment.
However, solutions like Cassandra can often require a fair amount of manual work to get up and running – many companies don’t currently have the expertise in-house to install, configure and maintain these types of open-source database clusters.
That’s why here at IBM we’ve been working on a new event data management system that provides an out-of-the-box solution for building event-driven architectures. It’s called IBM Project EventStore, and we’ll be releasing it as a Technology Preview – which is IBM’s equivalent of an open beta – in the next few weeks.
“In testing, we have been able to ingest a million ‘data events’ per second on a single Project EventStore node, which compares very favorably with the results we’ve seen from other solutions on the market.” -Anson Kokkat
Cindy Russell: What differentiates IBM Project EventStore from the current generation of event-driven platforms?
Anson Kokkat: The biggest difference is probably the ease-of-use. Project EventStore gives you everything you need to build an event data management system in a single, well-defined package. Today, you generally have to create your own microservices and set up your own clusters for staging and storage – it’s like assembling an engine from a set of valves, cylinders, spark-plugs and so on. Project EventStore is more like a pre-built engine that you can just drop in and get on your way. And because it has been built, tested and optimized by experts, there’s much less to worry about in terms of maintenance and performance, versus a solution you’ve put together yourself.
The results seem to confirm the theory: we have seen impressive results from Project EventStore so far, particularly in terms of the speed and volume of data ingestion. In testing, we have been able to ingest a million “data events” per second on a single Project EventStore node, which compares very favorably with the results we’ve seen from other solutions on the market.
“Project EventStore can be deployed anywhere, either on cloud or on premises. So it’s easy to get started, and it’s easy to move, adapt or expand your environment as your event-driven architecture evolves.” -Anson Kokkat
Cindy Russell: Pre-built solutions can often be black boxes that reduce the options for clients, or force them to invest in expensive proprietary technology. How have you made sure that Project EventStore won’t lock customers in?
Anson Kokkat: We’ve been careful to keep the architecture as open and flexible as possible, especially in terms of open data access. For example, the main ingestion and analytics services are built on Apache Spark and can work with Apache Kafka, and the storage layer can be written to Apache Parquet – all open-source technologies. The ingestion APIs are written in Scala, which is becoming a de-facto standard as a user-friendly, high-level programming language for data engineering and data science. And on the read side, we provide robust SQL APIs that make it easy for any trained analyst or database administrator to query data either in Project EventStore itself, or in the Parquet storage layer.
There is also a lot of flexibility from an infrastructure perspective: Project EventStore can be deployed anywhere, either on cloud or on premises, via container services such as Docker and Kubernetes. It’s easy to get started, and it’s easy to move, adapt or expand your environment as your event-driven architecture evolves. You also have near-linear scalability: when you need to ingest more data or run more queries, you can simply add more Project EventStore nodes to the cluster.
Talking with Anson helped me understand Project EventStore’s potential to change the game by giving a much wider range of organizations the ability to build, deploy and maintain robust event-driven applications. With Project EventStore, you don’t need to take on the risk of building a solution yourself, or hiring a team of technical specialists just to keep things up and running.
Since it is much easier to create innovative event-driven solutions, we can expect to see more and more organizations adopting the paradigm. This could open up a whole range of new use-cases, from embracing IoT solutions, to revolutionizing customer experience management with real-time insight into individual consumers’ behavior.
If you would like to make the leap to becoming a more event-driven organization, take a look at the IBM Project EventStore Technology Preview. You can sign up for access here, and take a deeper dive into the documentation here.