Blogs

Web analytics with IBM EventStore: Ingest streaming web events data

Post Comment
Software Architect - IBM Analytics, IBM

Universal connectivity is fueling streams of event data from a variety of event sources. Increasingly, organizations are developing and deploying event driven applications to harness the growing volumes of event data. IBM EventStore offers a scalable integrated system for enterprises to ingest, persist and analyze event data of any type. For an introduction, refer to the part 1 of 3-part blog series, Ingest and Analyze Streaming Event Data at Scale with IBM EventStore.

Use cases of event driven applications span across a spectrum of scenarios around IoT, Web Analytics, Gaming, Fraud Detection etc. This part 2 of 3-part blog series describes a popular web analytics use case and outlines implementation details on ingesting web events into IBM EventStore. 

Web Analytics with Click Stream Analysis: A Use Case of Event Driven Application

Over the years, web applications have become ubiquitous in driving online commerce. Increasingly online sales are driving a significant share of revenue for many businesses. As web applications become the primary sales channels, understanding customers’ online behavior is critical for businesses to drive sales. Click Stream analysis tracks and persists the sequence of web clicks from all users to analyze the Clicks data for better understanding of customer interests. 

Click Stream analysis requires building out the following components.

  • Trackers – Trackers are client-side code components embedded into web pages to generate web events based on user clicks. Typical web application will have multiple trackers across the web pages to generate different web events. The trackers submit web events to server-side ‘Collectors’.
  • Collectors – Collectors are server side applications that collect and stream web events to a data store that can enable further analysis.
  • Event Store – Scalable data store for persisting all web events with additional capabilities to drive comprehensive analytics. 

Typically, Tracker is Java Script code embedded into web pages in most implementations. The sections below outline how IBM EventStore is used to implement Click Stream analysis for a fictitious retail business. The implementation limits the scope to Collector and EventStore components. The intent is to cover steps associated with ingesting and analyzing web events using IBM EventStore. 

CYBERSHOP: The Retail Business

CYBERSHOP is a retail business that sells merchandise across multiple product lines ranging from smart phones, computers, appliances and electronics. Using click stream analysis the business intends to understand customers browsing behavior. It seeks insight into what products are of interest to which customers and how much time a customer is spending exploring different products. Leveraging these insights, the business intends to target customers with personalized offers in real-time and drive sales. 

The web application tracks multiple web events for every user:

  • PageView event indicates user viewed the product page or catalog page
  • AddToCart event indicates user added the product to shopping cart
  • Order event indicates user placed the order.

Each web event includes the following details:

  • EventId, EventType,
  • Timestamp, IPAddress
  • SessionId, UserId,
  • PageURL, Browser.

Web application uses embedded trackers to submit web events to a server-side collector. The server-side collector ingests event data to IBM EventStore. 

Click Stream Analysis with IBM EventStore

IBM EventStore offers multiple interfaces to ingest event data. Current build of developer preview supports IBM Streams and Scala API. 

  • IBM Streams offers an EventStore Sink Operator to support data ingestion for EventStore. Event driven applications can enrich streaming event data with powerful transformations by interfacing EventStore with IBM streams. 
  • Scala API supports data ingestion with OLTP operations and data analysis with support for OLAP type queries. 

Ingesting Web Events into IBM EventStore using Scala API

Here are the steps for ingesting event data. 

  1. Connect to EventStore and create a database for persisting event data

  1. Define schema for the click stream data 

  1. Create table with schema of clickstream data

To persist events with different schemas, Event Store will require different tables with matching schema definitions. The web events in the clickstream analysis for CYBERSHOP use a single schema. 

  1. Insert event data into the EventStore 

The code uses Spark Data Frames to have a collection of event records. The records are inserted in batch mode into the EventStore. 

The Scala API for EventStore supports multiple modes of ingestion. Applications ingesting event data can choose between real-time or batch modes. Both modes support synchronous and asynchronous invocations. Typically, asynchronous invocation in batch mode achieve highest ingestion rates and performance workloads in lab tests achieved 1 million records per second per node. 

Conclusion

This is part 2 of 3-part blog series (click here for part 1). The blog describes a web analytics use case with details on ingesting web events into IBM EventStore. The next part in the multi-part series will cover details on analyzing the web events data to track and collect insights on customer’s browsing interests.

Resources

The notebook with ingestion code and sample dataset is available for download. For more information and a free technology preview download, refer to the IBM Project EventStore.