Four Fundamental Differences Between TwinFin and Exadata
Today Netezza is launching a new eBook entitled, “Oracle Exadata and Netezza TwinFin™ Compared”. As the name implies, this eBook provides a comparison of the Netezza TwinFin data warehouse appliance and Oracle’s “appliance-like” database machine offering.
Certainly Netezza is not the first company to compare/contrast its flagship system with Oracle’s most recent entry. Richard Burns, a consultant over at Teradata did a laudable job exposing the technical shortcomings of the Exadata v2 machine as they pertain to data warehousing in a May 2010 whitepaper. And there have been several recent pieces written on Oracle’s apparent success although the publicly named customer-list has struck some as a bit underwhelming.
Netezza continues to compete (and win) against Oracle regularly in the marketplace, including in competition with the Exadata v2 product and so, we felt it was high time to put our own comparison story together with today’s eBook and with this little blog posting. Let me know what you think.
So where to begin? Let’s start with the fact that the Netezza TwinFin is built to excel at a specific purpose – as the best price/performance platform for Data Warehousing and Analytics in the market. Conversely, Oracle has tried to “kill two birds with one stone” in the Exadata v2 – aiming it primarily at the On-Line Transaction Processing applications space, but also making bold claims to performance as a Data Warehouse with it’s Sun-based Oracle Database Machine (DBM) and Exadata Storage Server, version 2 (Exadata).
So why does it matter that Oracle is aiming to do both OLTP and DW in the same system – apart, that is, from at least two decades of people trying-and-failing to do exactly that with the likes of Oracle in previous software and hardware instantiations? Let’s start with the workload requirements of the two application areas:
- OLTP systems execute many short transactions, typically of extremely small scope (touching only a handful of records) and in extremely predictable, well-understood access and query patterns. They need to excel at handling these small transactions in very high volume, combined with equally small writes to the database in the form of updates, insertions and deletions. This limited scope, high throughput and “regularity” of the access patterns make OLTP systems great candidates for intelligent caching and (multiple) secondary data structures, such as indices to speed their processing.
- Conversely, DW systems are typically asked to perform “read-heavy” queries and operations against the current and deep historical data sets. Rather than analyzing just a few records, a DW query might look at millions, even billions, of rows from a single table, combined with join logic with multiple other tables. Data warehouse systems are used by company analysts and managers to find the “needle in the haystack” in guiding enterprise decision-making in a more comprehensive and often ad-hoc manner – frequently mitigating the ability to use “tricks of the trade” such as results caching and/or indices.
So the two applications tend to lead to very different system/platform implications. No special “news” there – as I said earlier, people have been trying-and-failing to use a single system for both applications for years.
Without stealing any more of the thunder of our electronic publication today, let me just lay out what I believe are the fundamental differences between Netezza’s TwinFin and the Oracle Database Machine/Exadata as simply and plainly as I can:
|Netezza TwinFin||Oracle Database Machine / Exadata v2|
|True MPP||Hybrid "SMP-plus" Approach|
|Data Streaming with a Hardware Assist||CPU-intensive Processing for Basic DB Operations|
|Deep Analytics Processing||Central Cluster-based Approach|
|No-Tuning-Required Simplicity||Complex Array of Knobs and Levers|
In my view, these are "big deal" differences. They're not the result of a simple feature gap to be closed in an upcoming point-release, but rather go directly to limitations at the heart of the Oracle DBM/Exadata system architecture and/or business culture. To address them would require a major rearchitecting, or at least refactoring, of Oracle's decades-old DBMS code base. They also happen to be highly visible to customers and prospects, which makes for some interesting comparisons in head-to-head on-site Proofs of Concept (POCs).
1) True MPP vs. a Hybrid "SMP-plus" Approach
Netezza’s TwinFin uses a full MPP approach to data warehousing, pushing all of the processing down as close as possible to where the data is stored and maximizing the processing horsepower of MPP for scalability, throughput and performance – for even the most complex workloads. Using the MPP method of dividing the workload and attacking query problems in parallel, Netezza has been able to demonstrate market-leading data warehouse price-performance across four generations of data warehouse appliances.
Oracle’s DBM/Exadata takes a hybrid approach adding Exadata Storage nodes largely to handle data decompression and predicate filtering tasks, but still relying primarily on the SMP cluster of Oracle RAC to handle most of the data warehouse tasks, including complex joins. In addition the SMP cluster also must act as the central distribution point for any data that needs to be redistributed between and across Exadata nodes. To try to minimize this, Oracle and Sun’s solution was to “throw hardware at the problem” (quoting Teradata’s Mr. Burns), over-engineering interconnections, processor rates and other elements required because of all of this data movement, rather than refactoring and solving a fundamental software architecture issue.
The difference between the two is akin to an 8-lane continuous streaming superhighway in the TwinFin instance versus multiple freeways converging on and necking down to a two-lane country road via a “traffic roundabout”. I live in Massachusetts and can attest to the negative impact of taking multiple highways down to a single road – it happens every weekend at the gateway to and from Route 6 on Cape Cod.
2) Data Streaming with a Hardware Assist vs. CPU-intensive Work for Basic DB Operations
In addition to the advantages of the MPP architecture for data warehousing, the TwinFin system makes use of hardware acceleration for increased query and analytics performance. Coming in the form of the "DB Accelerator" that is part of each S-Blade in the TwinFin system architecture, providing four dual-core Field-Programmable Gate Arrays (FPGAs) on each DB Accelerator, this hardware acceleration takes care of fundamental processing steps such as decompression, predicate filtering and ACID-compliant data visibility at the full scan rate of the data from disk. The fact that this device is placed as close as it is to the disks for which it is performing its processing gives the TwinFin system much more performance leverage because data can be filtered, processed and value-added before undergoing any unnecessary CPU processing or having to be transported across an expensive network.
And the fact that it is a field programmable device means that Netezza can use it to introduce additional features and performance through a simple upgrade to our NPS software/firmware – as Netezza has with the introduction of two phases of hybrid column/row-level compression technology (with Release 6.0, scaling as high as 32:1 compression, depending on data patterns) first introduced in 2005, and our high-performance implementation of row-level security. Because it's performed in the FPGA in TwinFin, "Compression = Performance"; so if a customer's data is compressed by a 4:1 factor, the effective data streaming rate for processing queries is increased four-fold.
Conversely, the DBM/Exadata system relies entirely on CPU processing. In fact, the great majority of the functionality provided for by the Exadata nodes in the DBM/Exadata system is to replicate the functionality included in each FPGA core of the TwinFin - data decompression and predicate filtering. Because of the CPU-intensive nature of decompressing data in the DBM/Exadata system, Oracle "strongly suggests" lesser compression when data is required for high-performance data warehousing vs. "cooler" queryable archive purposes. Again, the heavy-lifting for query processing and analytics is left to the central SMP cluster nodes rather than parallel Exadata nodes, forcing Oracle to "throw hardware at the problem".
3) Deep Analytics Processing vs. Central Cluster Analytics
Netezza brings analytics to where the data is stored – as close as possible to where it is stored to do the processing – not just to decompress it and do predicate filtering, but to complete as much of the complex analytics as is possible, in parallel. That’s as true of the “traditional” OLAP analytics of SQL-based data warehousing as it is of the advanced and predictive analytics enabled by the new capabilities of i-Class in the “Second Wave of TwinFin”.
With i-Class, Netezza introduces a comprehensive, scalable and high-performance approach to advanced analytics for both our customers and partners, spanning Linear Algebra/Matrix manipulation, and engines for R and Hadoop along with several programming languages including C, C++, Java, Python and even Fortran. The i-Class functionality also offers plug-ins and packages for the Eclipse IDE and R GUI, and pre-built, analytic functions engineered to deliver performance at scale spanning data preparation, mining, predictive analytics and spatial functions together with access to analytics functions from the GNU Scientific Library and R CRAN repository. Extended by the i-Class embedded analytics capabilities, TwinFin allows our partners and customers to push-down applications, functions and algorithms going well beyond standard set-based SQL, at scale with high performance, freeing them of the latency and sampling requirements demanded by off-board processing platforms for advanced analytics.
The Oracle DBM/Exadata performs the majority of the OLAP analytics in the central cluster (RAC) nodes, after traversing the "traffic roundabout". And apart from basic scoring functionality, virtually ALL of the advanced analytics are performed in the cluster nodes as well. Placing the predominance of processing in the central SMP cluster means that both the functionality and scale of the analytics are limited by the capacity and performance that the SMP cluster can provide - typically limited to the elements included in Oracle's own "Data Mining" package.
The DBM/Exadata’s requirement for shipping the data from the storage arrays to the central cluster for analytics is akin to backhauling full massive truckloads of materials from a mining site to pick out the gold at a central headquarters rather than sifting out the most important nuggets in parallel and sending only those valuable elements back in the case if TwinFin.
4) No-Tuning-Required Simplicity vs. a Complex Array of Knobs and Levers
For a long time, the simplicity of the Netezza data warehouse appliance has shone through most strongly in the extremely limited tuning requirements it imposes on administrators of the system, particularly as compared to Oracle-based systems. Simplifying the system management is core to Netezza’s “appliantization” of the data warehouse and analytics platform. Rather than managing a “coordinated collection” of technology assets, the system and database administrators of TwinFin interact with a single appliance and use the redundant Linux-based SMP host nodes as the interaction point for all activities. Everything from database configuration, data distribution, data mirroring, monitoring, software upgrade and day-to-day management are simplified (in the words of one TwinFin customer, “It’s Netezza-easy – it just works.”).
No indexing is necessary (or even supported) in TwinFin to achieve high performance. Just about the only requisite “tuning” of the system is the definition of the distribution key for spreading data across all the S-Blades – typically the primary keys of the tables. Even in the internal management structure of TwinFin, our system management has been configured to get the maximum performance from the commodity subsystems (blades, chassis, disk arrays and network) by connecting them in novel ways and then managing them at a system level, rather than at the subsystem or rack-level.
While it is true that Oracle has simplified some of the tuning knobs and levers in the DBM/Exadata, prospective customers should ask them if they really have moved into the domain of requiring only a small handful of tuning knobs & settings; or whether they still require, or more colloquially, “strongly suggest” the use of dozens or even hundreds of settings (depending upon the number of objects being maintained and optimized). How many dozens of IP addresses are needed to configure and manage the DBM/Exadata (TwinFin requires only two)? Oracle even have a special service to help DBM/Exadata customers migrate and tune their systems and databases for performance and some of their leading Performance Architects even talk about the requirement of using functions like the Oracle SQL Tuning Advisor as an inevitable fait accompli.
By Oracle’s own admission, the time-savings that customers can expect to achieve in managing and tuning the DBM/Exadata system in Oracle 11g r2 is only 26% less than in Oracle 11g. Contrast that with installation after installation of Netezza appliances where 100s of terabytes of data under management in a data warehouse(s) are being maintained by two or even less then one FTE, rather than a team of Oracle specialists. It all depends on one’s perspective and philosophy in building a real appliance for the data warehouse market. Where others may see the need to tune, partition, index and sub-index data sets for performance purposes as an inevitability, Netezza sees that same need as reason to enhance TwinFin’s capabilities in order to obviate it.
All of this really adds up quickly to a significant price-performance advantage for customers of TwinFin – and with our limited tuning and simplified operations, also translates into much more rapid time-to-value for Netezza’s customers, too. So that’s it – four simple fundamental differences that really set the TwinFin appliance apart from the DBM/Exadata. Agree? Disagree? Let me know what you’re thinking. And now, go over and have a look at today’s eBook release for the rest of the story.