The cost of data warehouse appliance complexity: Comparing IAS and IntelliFlex

Product Marketing Manager for Data Lake & Cloudera Partnership, IBM

In a previous blog, I explained how data science capabilities, massive parallel processing (MPP)

and usability improvements in data warehouse appliances can help the bottom line—and why old-fashioned architectures might not cut it. But what does that look like in practice?

Research firm Quark + Lepton recently published a report where they compared two data warehouse appliances: the IBM Integrated Analytics System (IAS) and Teradata IntelliFlex. What they found is a 45 percent lower TCO on average for IAS over five years.

The cost of data warehouse appliance complexity: Comparing IAS and IntelliFlex

The two are separated by a fundamental difference in design philosophy: according to Quark + Lepton, Teradata is complex, consisting of compute nodes paired with storage that are linked together by manual connectors. Database admins might need to spend considerable time and effort to keep everything working properly, and data scientists need to transfer data between different environments to build and run their analytic models. Because frequent manual intervention can be required, Quark + Lepton also notes that IntelliFlex can be costly to purchase, maintain and configure while performance is limited by the latency involved in data migrations.

By contrast, the IAS—a next-generation data warehouse appliance— implements distributed, MPP computing in a purpose-built appliance and is part of a wide-ranging hybrid data management ecosystem from IBM. This provides several major advantages when integrating a data warehouse into an environment with multiple data sources, formats and destinations.

Comparing connectivity

A common SQL engine (CSE) underlies IAS, meaning that SQL queries can be run across the entire Db2 database family while avoiding rewrites and code changes. This is true for transactional or unstructured data, whether it is on-premises or in the cloud. This is vital as it helps you gain insights based on a more complete set of data without additional delays. Native cloud compatibility also means you can expand your hybrid environment seamlessly without the fear that you’re siloing any of your data.

This connectivity extends to external data repositories and previous generations of appliance technology as well. The data virtualization capabilities of the CSE help access data in external data sources including Oracle, Teradata and Microsoft SQL server; cloud sources like Amazon Redshift; and open-source solutions like Hive. Because IAS is based on Netezza technology, it is also highly compatible with previous Netezza products like IBM PureData System for Analytics. In both of these cases, accessing the data you need can be done more easily. 

By contrast, connecting multiple data sources to an IntelliFlex data warehouse can mean implementing multiple custom connectors, all of which require hands-on attention to build and maintain. Further complicating the architecture, Quark + Lepton indicates that Teradata must rely on partnerships with third-party cloud vendors like Amazon and Microsoft because they lack native cloud offerings of their own.

In addition, Quark + Lepton says data migration is often encouraged by IntelliFlex prior to analyzing data. This can introduce additional time-to-insight, either through additional latency as the connectors federate data or by delaying the process until data can be migrated. The additional steps between data acquisition and data use can also result in significant opportunity costs. If your business’ IT specialists and data scientists are spending time maintaining the architecture and preparing the data, they can lose the chance to engage in higher-value activities like exploring new insight opportunities.

Keeping it simple and smart

IAS is designed with simplicity in mind. It is a pre-tuned appliance with Watson Studio intelligent data management capabilities included. Data science technology is integrated right into the IAS, so Spark or Hadoop workloads can be developed and run without migrating data between process and storage nodes. You can achieve better insights based on a wide range of data – including unstructured data – in a timelier manner with the advanced processing capabilities of Spark. Built-in machine learning capabilities can also help data scientists go from ideation to gathering insight faster. For example, machine learning libraries can help data scientists avoid starting models from scratch by providing building blocks they can reuse and add onto. 

Simultaneously increasing query performance while reducing storage requirements is also simpler when using IAS. Specifically, the BLU acceleration technology in IAS leverages multiple techniques—including in-memory columnar processing, single instruction multiple data (SIMD), actionable compression and data skipping—to maximize performance and efficiency. Actionable compression enables automatic storage saving, removing the need for intervention from a database administrator. You can also apply a number of operations to this data without the need for decompression. Data skipping also speeds analytics by determining which data is not required to process a query and ignoring it. Together these capabilities can result in substantial storage cost savings and can help provide faster answers more easily.

In contrast, IntelliFlex has no native support for machine learning, and data science workloads require large-scale data migrations according to Quark + Lepton. Again, this means increased latency and additional hassle for DBAs and data scientists who should be spending their time on more important, value-add activities.

The cost benefits of a new data philosophy 

The difference in underlying design philosophy between IAS and IntelliFlex has massive significance for how the two solutions work in practice. The manual integration and tuning present with IntelliFlex can lead to added personnel costs and extra complexity. Quark + Lepton explain that the extensive requirements to migrate data for processing can mean extra latency and slower results. And the lack of native cloud connectivity means you may need to rely on third-party cloud solutions.

All this extra work comes at a cost. The Quark + Lepton report concludes: “Due to smaller data center footprints, ease of management, and system elasticity, five year costs of ownership for the IAS average 45 percent less than Teradata IntelliFlex configurations.” In fact, as the chart above shows, for the same money you’d spend to run an IntelliFlex system for one year, you can run IAS for over four.

New data challenges demand new ways of thinking about data warehousing. The IBM Integrated Analytics System represents a fundamentally rethought approach to ingesting, managing and analyzing huge amounts of diverse data. And, as Quark + Lepton confirms, there’s real value to be found in this new approach. Read the full report and learn how your business can prepare for data science challenges head-on while maintaining a healthy TCO. Or speak to a data warehouse appliance expert during a no-cost, one-on-one consultation.