Using Materialized Query Tables
Boost database response time for complex queries in IBM DB2 for Linux, UNIX, and Windows
For database administrators, how well database applications perform is of the utmost importance. One way to significantly speed up the response time of decision support queries is to use materialized query tables (MQTs).
In this column, I’ll explain what MQTs are and I’ll show you how to create and populate a system-maintained and a user-maintained MQT. I’ll also show you some situations where using MQTs can be beneficial.
What are MQTs?
The definition of an MQT is based upon the results of a query. Think of an MQT as a kind of materialized view, because the data for an MQT comes from one or more base tables. The difference lies in how MQT data is generated and where it is stored. Usually, data for a view is generated by executing the query upon which the view is based each time the view is referenced; the data resides in the underlying base tables that are referenced by the view. MQT data is generated by executing the query upon which the MQT is based at regular intervals or at a specific point in time (which you control); the data resides in the MQT itself. Like any other table, an MQT can have indexes, and the RUNSTATS utility can be used to generate and store statistics about MQTs.
MQTs provide a powerful way to improve response time for complex queries, especially queries that perform one or more of the following operations:
- Aggregate data over one or more dimensions
- Join and aggregate data over a group of tables
- Perform repeated calculations
- Perform resource-intensive scans
- Access a common subset of data—that is, retrieve data from a “hot” horizontal or vertical database partition
- Retrieve data from a table, or part of a table, in a partitioned database environment
Knowledge of MQTs is tightly integrated into the IBM DB2 SQL and XQuery compilers. During the query rewrite phase, the DB2 optimizer matches queries with existing MQTs and determines whether to substitute an MQT for a query that accesses base tables. (If an MQT is used, the Explain facility can provide information about which MQT was selected.) The larger the base tables, the greater the potential response time improvements, because an MQT grows more slowly than its underlying base tables.
Creating an MQT
MQTs are created by executing a special form of the CREATE TABLE SQLstatement:
DATA INITIALLY DEFERRED
REFRESH [DEFERRED | IMMEDIATE]
[ENABLE | DISABLE] QUERY OPTIMIZATION
MAINTAINED BY [SYSTEM | USER]
- TableName identifies the name to be assigned to the MQT you’re creating
- SELECTStatement identifies the SQL query that is to be used to populate the MQT you’re creating
Two different types of MQTs can be created: system-maintained MQTs and user-maintained MQTs. Insert, update, and delete operations cannot be performed against system-maintained MQTs. However, a REFRESH IMMEDIATE system-maintained MQT is updated automatically, as changes are made to all underlying tables upon which the MQT is based. The REFRESH keyword lets you control how the data in the MQT is to be maintained: REFRESH IMMEDIATE indicates that changes made to underlying tables are cascaded to the MQT as they happen, and REFRESH DEFERRED means that the data in the MQT will be refreshed only when the REFRESH TABLE statement is executed.
User-maintained MQTs allow insert, update, or delete operations to be executed against them and can be populated with import and load operations. However, they cannot be populated by executing the REFRESH TABLE statement, nor can they be created with the REFRESH IMMEDIATE option specified. Essentially, a user-defined MQT is a summary table that the DBA is responsible for populating, but one that the DB2 optimizer can utilize to improve query performance. (If you don’t want the DB2 optimizer to utilize an MQT, simply specify the DISABLE QUERY OPTIMIZATION option with the CREATE TABLE statement used to construct the MQT.)
When to create MQTs
How do you decide if having an MQT would be beneficial, or determine which MQTs should exist? IBM DB2 Design Advisor can help. Using current database statistics, the DB2 optimizer, snapshot monitor information, and/or a specific query or set of SQL statements (known as a workload) that you provide, the Design Advisor recommends indexes, MQTs, or multidimensional clustering (MDC) tables that would improve performance. The indexes, MQTs, or MDC tables; the statistics derived for them; and the Data Definition Language (DDL) statements required to create them can be written to a user-created table named ADVISE_INDEX.
The Design Advisor is invoked by executing the db2advis command. A GUI version of the Design Advisor, known as the Design Advisor Wizard, is also available; activate it by selecting the appropriate action from the Databases menu in the DB2 Control Center.
MQTs and subdomains
While it is possible to define MQTs for each and every query predicate used (which, by the way, is not a good idea), a decision support and data warehouse environment often contains a very small set of common query subpredicates and qualifiers that are executed over and over. A powerful, yet regularly overlooked use of MQTs is to optimize access to frequently used subdomains of data that resolve such subpredicates.
In this scenario, MQTs do not contain summarized data, but instead are used to help the DB2 optimizer quickly identify and isolate qualifying rows; the MQTs act as a quick prequalification of rows that are involved in several more complex queries. For example, a reporting system might have a dozen or more reports that use the subdomain of data representing YESTERDAY. One report might look at yesterday’s overall sales, another looks at yesterday’s sales by region, while a third looks at yesterday’s sales by product. Each of these three queries likely shares the common subpredicate WHERE DATE = YESTERDAY, which immediately limits the record set.
If you created three separate MQTs to answer each of these queries, they would require a significant amount of disk space to maintain. If, however, you constructed an MQT that simply delimits the domain, i.e., YESTERDAY; included attributes that are not likely to change, such as “Region” and “Product_ID”; and then created an appropriate index over this single MQT, all queries about YESTERDAY could be satisfied by this single MQT. Back JOINs and row fetches would most likely be required, but these operations would use a far smaller subset of the data.
The idea here is simple: make it as fast and easy as possible for DB2 to cut the data involved in multiple queries down to size without having to read several indexes, while also avoiding the creation and subsequent management of many similar MQTs. Find the common patterns in your queries, think about them in terms of the domains they express, and determine which attributes are used most often and which attributes will most quickly reduce the size of the data. Then, create an appropriate MQT for the queries being executed, create appropriate indexes on the base table(s) and the MQT, and keep the statistics current. Finally, let the DB2 optimizer choose whether to use the base table or the MQT; do not explicitly reference the MQT in the SQL.
REFRESH IMMEDIATE versus REFRESH DEFERRED
REFRESH IMMEDIATE MQTs affect query performance in the same manner as indexes. Specifically, they:
- Speed up the performance of relevant SELECT statements while returning current data
- Are automatically chosen by the optimizer whenever it makes sense
- Can degrade the performance of insert, update, and delete operations
- Cannot be updated directly
- May occupy considerable disk space
- May have exclusive locks held during updates of their base tables
On the other hand, REFRESH DEFERRED MQTs have no effect on the performance of insert, update, and delete operations. As a rule of thumb, use REFRESH IMMEDIATE MQTs in moderation to optimize frequently run queries in which current data is important.Special thanks to IBM Consulting Learning Specialist–Data Management Melanie Stopfer, IBM Senior DB2 Technical Evangelist Dwaine Snow, and IBM Senior Competitive Specialist Reed Meseck for providing information used to develop this article.