Diving Deep into Analytics from the Netezza Platform
"The milk of disruptive innovation doesn't flow from cash-cows "
-- David Isenberg, Blogger, Musings About Loci of Intelligence and Stupidity
Dare I say ... "orders of magnitude performance" for data warehouse applications is old news as far as Netezza customers are concerned! It became fairly obvious to me at the Netezza European User Conference, held a few months ago. In presentation after presentation, customers talked about the performance and simplicity benefits they got from "the Netezza" - how the proof-of-concept (against their favorite legacy data warehouse vendor) seemed unbelievable at first, but certainly proved true in production; the fact that they did indeed get orders of magnitude better performance; and how all this changed the way they did business. Brian Ganly of The Carphone Warehouse used this chart to highlight Netezza performance during his talk about the "Netezza Experience." I think it captures the sentiment really well ...
It's not that data warehouse performance is not important any more, or that somehow the 100X performance that Netezza delivers is "enough". In fact, what the Netezza customers were alluding to, in a customer's own words, is: "Netezza does what it says on the tin!" We talk about blisteringly fast performance without requiring tuning and aggregations at half the cost of other systems, and we deliver. Once customers see for themselves what "the Netezza" can do for their data warehouse, they get intrigued about the possibility of what else it could do for their business. And that quickly leads them to look beyond raw performance for data warehouses and apply "the Netezza" to new and interesting big-data analytic problems.
As the data warehouse market continues to evolve, more and more companies are looking to use information as a competitive lever across their organizations. The most successful will be those that make use of information to exploit arbitrage windows in the marketplace and predict future outcomes more accurately. These companies will differentiate themselves by making high performance analytics pervasive, providing employees, partners and vendors access to the kinds of analytics that are only available to a select few in the enterprise today.
What's needed to deliver on the promise of advanced analytics is a platform that can overcome the challenges of doing deep analytics on large data volumes - performance, complexity and cost. Let's look at how advanced analytics are done on traditional systems. In most cases, these poor data warehouses are so overtaxed that adding any more processing is a certain way to bring them to their knees. And so the usual approach is to extract huge data sets onto an outsized SMP server or compute grid, perform the analytic computation on it and load result sets back to the data warehouse for querying. You can clearly see the problems with this approach. It's expensive, especially when you're talking about a large SMP or grid; it's complex since you have more systems to maintain; but most importantly you get poor performance even if you spend tons of time and money on the infrastructure. The data movement back and forth introduces the same latency and performance bottlenecks that still plague traditional data warehouse architectures.
What we've done with "the Netezza" is created just such a platform that overcomes these complex analytics challenges. The idea is quite simple actually. Algorithms for analytic tasks such as scoring, text and spatial processing, image and video analysis and financial simulations can be run directly on the intelligent nodes inside the Netezza. So these algorithms can act on the data where it resides, rather than sending it off-board for processing. You not only get the benefit of fully parallelized execution across hundreds of processors resulting in orders of magnitude better performance for analytics, but also the simplicity and economy of an appliance. Plus the Netezza is able to handle all this extra processing because of the spare processing capacity built into each of its intelligent nodes. Let me refer you to Phil Francisco's blog for a blow-by-blow version of how "[OnStream analytics|p-1032]" works in practice.
This is all great so far - I mean any platform that provides these kinds of advantages has to be quite extraordinary! But the true value of a platform is determined by the applications that run on it and how innovative and differentiated they are. That's where there is a lot of interest and excitement in the enzee community. More on that very soon ...