Recently, the IBM Netezza Data Warehouse Appliance was rebranded as IBM PureData System for Analytics, along with a new version of software. Here is some historical context to the new release.
Over the last decade, our customers have deployed appliances in predominantly two roles: as data warehouses, and to run deep analytic queries across large data sets. Achieving success as they migrated from older technologies – often following years of disappointment characterized by poor query performance and high database administration costs – they identified further applications that they would prefer to run on their appliances. Real examples of their requests include:
- a financial services firm wanting to know the sum total of wire transfers made by a given business customer in the last 30 days
- a telecom provider searching for fraudulent use of a specific cell phone number
- a retailer seeking the recent purchase history of a particular customer (perhaps based on loyalty card data)
- a health clinic polling medical records for changes in medication for a given patient
Such queries – we characterize them as recent individual customer events – are traditionally run against an operational data store (or ODS), a system straddling the online transaction processing tier and the data warehouse. Compared to the heavy lifting typically demanded of our appliances, these micro-analytics generate a different workload. Creating an analogy from athletics, we can view a micro-analytic query as a sprint across a single table whereas a deep analytic query must first complete a marathon task of joining data from multiple, often very large, tables, before tackling a hill finish when retrieved data are processed using one or more algorithms to create a result.
In response to our customers’ requests, we formed a special project team, drawing individuals from different engineering teams to consider whether our appliances could be optimized for micro-analytic workloads. The traditional approach to achieving high performance for queries in an operational data store is to predefine access paths to the data. But these strategies tend to penalize traversing data by alternate paths and so degrade performance of other analytic queries. We set ourselves two constraints: proposed changes would be rejected if they degraded performance of other queries or created significant administrative overheads.
Returning to the athletics analogy, taking a few seconds to leave the racing line and collect refreshment, while prudent in a marathon, is foolhardy in a sprint. Our appliances delight customers by reducing the time required to process complex queries from days and hours to minutes and seconds – in this context, the fact that an initiation task takes 100 milliseconds to complete is unimportant. However, for micro-analytic applications where a hundred short queries may run concurrently, it becomes important to reduce that task’s duration.
The new software release delivers the results of our special project. Our internal benchmarking against a suite of micro-analytic queries of the type requested by our customers indicates that appliances updated with the new software release will run these workloads between 10 and 20 times faster than the previous software version.
Thanks to the customers who took time to tell us how we could bring further improvements to their data warehouse environments, and congratulations to the project team for extending appliance performance and simplicity to micro-analytic queries.