Hadoop & Netezza: Synergy in Data Analytics - PART 2

VP Product Management & Marketing

I mentioned in my previous post that Netezza is excited about our partnership with Cloudera and Hadoop because we’ve already seen some of our customers benefit from the synergy of Hadoop and Netezza TwinFin™ technologies working together.

As I noted, these types of strategies play to the strengths of both technologies and roughly break down into two categories: 1) the use of a Hadoop Cluster for data ingestion, and 2) using a Hadoop Cluster for long-term data retention, which I’m addressing today.

Netezza TwinFin with a Hadoop Cluster Used for Queryable Archive Analytics

The second pattern we have seen customers deploy is one in which the Hadoop Cluster is used for long-term data retention, or as a “queryable archive”. Here one could think of Hadoop as a complementary analytic extension of the Netezza TwinFin when there is far less premium placed on low-latency or high-performance. In addition to the weblog and unstructured data analysis discussed in Pattern 1, the queryable archive could also retain long-term copies of structured data that had previously been loaded into the high-performance TwinFin appliance.


Hadoop Cluster Used for Queryable Archive

With a mix of structured, semi-structured and unstructured data loaded across the two complementary systems, customers can alter the level of granularity and data retention periods across each and typically use TwinFin for processing “hot” data and the Hadoop Cluster for processing “cool” or “cold” data, perhaps with specialized analytics. A deployment of this pattern could look like the following diagram:


Readers should view this pair of posts as a “point-in-time” look at the market. Our customers continue to innovate and make use of the complementary strengths of TwinFin and Hadoop. And Netezza will continue to innovate both inside the appliance – adding performance, scale, workload management capabilities and especially with the advanced analytics of i-Class, through partnerships like the one announced with Cloudera a week ago, and through expansion of our platform, software and virtualization capabilities beyond the TwinFin and Skimmer™ appliances. Those innovations should help alter and/or enhance some of the deployment directions discussed here.

Now, as I said at the outset of these two posts, I’d like to hear from you on your Netezza & Hadoop co-existence deployment and/or compatibility wish-list ideas. What would you like to see?