Taking another look at in-database analytics
Data transformation is a core process in the operation of most business analytics hubs. However, even experienced data professionals can suffer from certain misconceptions about this practice, believing that data must be moved from source platforms to other platforms, often through several intermediary staging and storage systems, to enable transformation for downstream consumption in analytics applications.
But execution of high-performance data transformations on transactional source systems—such as in DB2 for z/OS on System z—is possible using a capability known as in-database analytics. Indeed, traditional practice—known as extract–transform–load (ETL)—might not be the ideal deployment approach for executing data transformation in many real-world scenarios, for a few reasons in particular:
- ETL can increase data latency to longer than a day.
- ETL windows lengthen with increases in quantity of data.
- ETL processes can take months to implement for new data elements.
However, before recognizing in-database analytics on transactional source systems as an alternative to ETL, data professionals must first abandon some common myths:
- Myth 1: Transactional workloads cannot be executed on the same platform as analytics workloads.
- Myth 2: Running analytics on transactional data increases your operational cost.
- Myth 3: Transactional systems are not designed to execute analytics.
Indeed, and contrary to popular belief, accelerating efficient transformations on transactional source systems without affecting transactional workloads or moving the data for staging or final delivery is eminently possible. Here are some options for doing just that:
- Offload ETL functions to an accelerator on the source system, such as DB2 Analytics Accelerator—an extension of DB2 for z/OS with Netezza technology—that provides high-performance storage saver, query acceleration and accelerator-only table (AoT) capabilities (called temporary tables in DB2 Analytics Accelerator).
- Do transformations within SQL using only the transactional source system by means of real-time data transformation for data consumability in SQL via views. This involves separating data into groups of semistatic (categorical) and dynamic data. Generate semistatic content daily, weekly or monthly, as necessary, to represent metadata associated with customers, products or business units that would typically be contained within a join of dimension tables and fact tables. This also includes the keys necessary to join to operational tables, but not the fact data. Dynamic data, or facts, from operational systems should be joined by leveraging the semistatic data and joining it to the operational system data using the keys involved in generating the semistatic table content.
- Views can hide SQL complexity from users while offering the intelligence necessary to retrofit data and simplify access.
- Views can reflect existing data warehousing/data mart schemas while keeping existing transactional workloads running.
- Views can leverage existing database objects (dimensional structures) to transform and standardize data.
- Repetitive transformations from operational data to information data can be standardized by leveraging smart data modeling techniques and objects and by staging prepared data objects before they are joined to fact data.
- Removing complex processes and prestaging data can significantly boost performance while enhancing access to operational data.
What’s more, during execution of multiple transformations on data that is essentially categorical (data that only changes periodically, or semistatic data), accelerators such as DB2 Analytics Accelerator can further enhance performance—as can database performance objects, materialized query tables and AoTs.
To learn more, find out how you can execute high-performance transformation in DB2 Analytics Accelerator.