Real Time Versus Customer Time
For big data, how fast is fast enough?
While not quite up there with the marketing buzz around big data, the term real time has to be running a close second at this point. Unfortunately, the market has grown so sloppy in defining it that a point of clarification is needed. Much of what is referred to as real time is really customer time—and yes, the distinction matters both in how organizations architect their systems and stay within their budgets. As the saying goes, speed costs money; how fast do you want—or in this case do you really need—to go? So what is customer time and how is it different than real time? The idea behind thinking in terms of customer time, and why I came up with the term, is that it is a useful way of understanding how fast is fast enough for a given use case. Real time means real time, or at least it should, which is generally in the order of milliseconds. Customer time, on the other hand, is based on the perceived latency from the end user’s point of view. If you beat end users to the next action, decision, or screen paint, then from their point of view it is happening in real time even if there is significant latency in the overall process. That action may be nearly instantaneous, but more often it is in minutes, hours, or even days. That difference can loosen processing requirements dramatically and create increasing architecture and system flexibility in how the problem is solved. Most people are familiar with the idea of real-time systems, especially embedded ones—think flight-control systems and automotive engine management. But what about examples for consumers? In the healthcare space, a pacemaker must be real time, but building models that predict diabetes risk can probably exist in customer time, so long as they are ready for the next visit to the doctor’s office. That way, medical staff can have a timely and personalized conversation with the patient.
The right time for the right needs
In the automotive space, interactive navigation systems that help drivers avoid traffic jams better be real time—especially if you live near Los Angeles like I do. But personalized maintenance recommendations can be in customer time and delivered just prior to the next visit to the dealer. Merchant offers when driving through a city at lunchtime should be in real time; personalized financial plans and risk models for retirement planning should be in customer time because they don’t change minute by minute. If you have real-time needs, then—surprise—real-time systems are necessary to handle them. Make sure you are looking at real-time systems that can handle any data-type needs, be redirected while the data is in flight, and work well with a wide range of analytics and models—like the IBM® InfoSphere® Streams platform does. If your use case is really based on customer time rather than strictly real time, it opens up a much broader array of technologies including Hadoop–based architectures. Hadoop might be a good place to try and do it all, if you can get the latency to work within customer time. Storing multiple data types, doing extract-transform-load (ETL)–type preparation of the information sets, and then doing the analytics and machine learning—also known as math (more on math in a future column)—have become common Hadoop use cases for good reason. Using Hadoop in your architectures does introduce latency compared with a true streaming engine. That said, however, there is an approach I call fast batch. You can apply it to chew through the information pretty quickly and then push it to Apache HBase—for external application retrieval—or into another system for further processing or application interaction. This approach can cover a pretty wide range of needs, provided you design your jobs and build your clusters the right way.
The right tools for the job
One underappreciated consideration of real time versus customer time is making sure you take a hard look at the existing tools you and your teams use and determine if they are interoperable with what you are proposing. For example, you certainly can use Hadoop to ingest and boil down data sets to then push them into a more traditional relational environment for IBM SPSS® predictive analytics software, R, and SAS to build models against. But you need to ask if you have the Hadoop skills—or tooling—to make that job easier than simply loading directly into something like Netezza. By the same token, if you want to push into real-time systems, do you have knowledge and skills to make use of the real-time analytics you will be capable of generating? As always keep in mind a few last points of pragmatic guidance: keep things as simple as possible and be driven by fit-for-purpose principles. Do not try to jump into multiple new technologies at the same time, make sure to fully leverage those technologies you already own, and minimize data movement by instead bringing the compute tasks to where the data is. I’d be keen to hear how you are thinking about real-time versus customer-time use cases and where you draw the line between the two. Let me know what you are thinking in the comments. [followbutton username='thomasdeutsch' count='false' lang='en' theme='light']
<table cellpadding="0" cellspacing="0" valign="top" width="15%>