Blogs

Post a Comment

Is all in-memory the best for Analytics?

March 1, 2012

Emilie Werr, VP & Head of Enterprise Architecture for NYSE Euronext discusses the benefits seen from their new IBM Netezza data warehouse appliance.

My first encounter with in-memory database technology was back in 1998 when I was working at e-commerce pioneer Open Market.  At the time, we had acquired an in-memory database solution that powered an online product catalog.  Because everything was in-memory, the response time as different parameters were applied to product searches was excellent -- and even more than a decade ago, sufficient memory was available at reasonable cost to enable this kind of operational type workload.

The Rise of In-Memory Analytics

If we jump forward to present-day, there is a growing interest in the use of in-memory technology for advanced analytics… some might say that we are at a strategic inflection point in the industry as we start to shift from traditional spinning disk solutions to in-memory ones.

Many vendors claim results of 10-1000x performance gains from going “in-memory” thus enabling “real-time analytics”.

The Rise of Big Data

In parallel, we are also seeing the rise of Big Data, with Hadoop and streaming analytics solutions providing the means to analyze the mountains of non-traditional, non-relational data that we, as a species, seem to generate in the exabytes on a daily basis.  A recent IBM study found that consumers are creating 2.5 quintillion bytes (or 2.5 exabytes) of data daily, with 90 percent of the world's data created in the last two years alone. 

How do we reconcile the two?

We are generating data at a fiercely voluminous rate, and analytic consumers of that information (retailers, telco-providers, security agencies, scientists, etc..) require high-performance cost-effective solutions that can efficiently mine that data for the golden insights that will power the discoveries and competitive advantages of the next decades.

Consider the example of NYSE Euronext, a long time Netezza customer.  Every trade that goes through Euronext lands in a Netezza system, where NYSE runs market surveillance and regulatory applications against them -- approximately 1 Petabyte of data.

These kinds of applications look for anomalies, and outliers -- precisely the kinds of things that we lose if we have to summarize, aggregate, or otherwise dumb down our data.  Without the ability to do deep analytics across all of our data, these applications are useless.

In short, we very often need to be able to do deep analytics across all of our data, and that is where in-memory solutions often fall short -- either because they do not yet scale efficiently, or because they are prohibitively expensive.

Consider the following:

  • In-memory systems require traditional disk behind them for data persistence
  • The amount of disk is typically a multiple of the amount of RAM, say 5x

If we take the NYSE Euronext example from above, and apply the in-memory metrics, we find an untenable proposition:

  • 1 PB of data, assuming 5x compression = 200 TB of RAM
  • Plus the 1PB+ of traditional disk as a persistent store

…which is comfortably handled by Netezza data warehouse appliances with in-database analytics at a very reasonable price/performance point.

In-Memory Databases have a place

In-memory databases have a place in your analytic environment, but they are not the holy grail of analytics.  For applications where the dataset can fit comfortably in memory, and where near real-time performance is required, in-memory solutions are definitely worth a look; indeed IBM has some excellent technology in that area.

But think of in-memory platforms as being a component of a larger Logical Data Warehouse (LDW) environment that includes Big Data tools like IBM’s BigInsights and InfoSphere Streams, relational data warehouse analytic environments like Netezza and ISAS, and the tools to analyze the right data in the right place!