DB2 Indexes and Query Performance: Part 2
No new indexes? No problem.
In the first installment of this two-part series, I focused on the pros and cons of reducing query times by adding indexes to a DB2 database. We discovered that there is a point where aggregate index costs—more-expensive INSERT and DELETE operations, for example—outweigh a new index’s query speed-up benefit. But if you don’t want to increase the number of indexes defined on the table, are you at a performance tuning dead end? Of course not. You have plenty of options for accelerating queries in a DB2 environment while holding the line on indexes.
Better indexes versus more indexes
Start with an index effectiveness upgrade. Can you replace useless indexes with indexes that enhance query performance? This could be a challenge with earlier versions of DB2 if a lot of dynamic SQL statements were executed, because the SYSCAT.PACKAGEDEP catalog view (DB2 for Linux, UNIX, and Windows [LUW]) and the SYSIBM.SYSPACKDEP catalog table (DB2 for z/OS) show index dependencies only for static SQL statements.
However, DB2 9.7 for LUW and DB2 9 for z/OS introduced a handy catalog column called LASTUSED. In the SYSCAT.INDEXES catalog view (LUW) and the SYSIBM.SYSINDEXSPACESTATS catalog table (z/OS), LASTUSED shows the date on which an index was last used either to speed up a DML statement (static or dynamic) or to enforce a referential integrity constraint. If the LASTUSED column contains its default value ('0001-01-01' on LUW or NULL on the mainframe), the associated index is a good candidate for being dropped. Of course you’ll want to do some additional checking, just to make sure that the index truly isn’t needed.
Another way to boost index effectiveness is to add columns to an existing index, so as to increase the number of columns that can be matched with query predicates or to get index-only access for more queries. But be careful with this tactic; you do not want to replicate a table in an index by including all of the table’s columns in the index key.
Consider adding a column to an existing index if the additional column is relatively short and if its inclusion in the index key would provide a major performance boost for one or more high-cost queries. With DB2 for LUW, you’d drop the existing index and re-create it with the additional column; on a DB2 for z/OS system, you can add a column to an index via an ALTER INDEX ADD COLUMN statement. DB2 10 for z/OS introduces a nifty new way to add a column to an existing index: the ADD INCLUDE option of ALTER INDEX.
No-new-index tuning options
Rewriting long-running queries is another way to boost performance without adding indexes. One technique often used is rewriting to eliminate predicates linked by OR, which are not indexable (aside from the possibility of DB2 using a multi-index access mechanism, called index ORing, that involves the union of row ID lists from two different indexes).
Using UNION to replace an OR can be especially effective if the query involves a join and the two ORed predicates reference columns in two different tables. In such a case, the OR prevents the two predicates from being applied until after the respective tables have been joined. Note also that in the IN-list example, both of the predicates in the rewritten query are stage 1 and indexable, leading to better index key column matching and potentially much-reduced query elapsed time.
Also in the query rewrite vein, modifying a SELECT to get rid of a range-type predicate can increase performance by improving index key column matching. As we saw in the last issue, the matching of predicates to index key columns stops after a range predicate—such as >, <, or BETWEEN—is encountered. In some cases, you can use an IN-list to replace a range predicate. For example, if COL1 is a character column of length one, a predicate like COL1 BETWEEN '2' AND '4' could be rewritten as COL1 IN ('2', '3', '4').
Here’s one more query rewrite idea: if possible, remove column expressions in predicates to make the predicates indexable. Let’s say that :V1 and :V2 are host variables, with :V2 containing a date value. Now, consider this predicate:
(START_DATE - :V1 DAYS) < :V2
The arithmetic expression involving the START_DATE column makes the predicate non-indexable. If you rewrite the predicate to get the arithmetic operation on the right side of the operator (“operator” being =, >, <, etc.), it becomes indexable and you could see a big reduction in query run time. The rewritten example predicate would look like this:
START_DATE < (DATE(:V2) + :V1 DAYS)
Eliminate OR with UNION
SELECT * FROM TABLE_A A, TABLE_B B WHERE A.COL3 = B.COL3 AND A.COL1 = 2 OR B.COL2 = 'CAT'
SELECT * FROM TABLE_A A, TABLE_B B WHERE A.COL3 = B.COL3 AND B.COL2 = 'CAT'
Eliminate OR with an IN-list
WHERE (COL1 = 2 AND COL2 = 'CAT') OR (COL1 = 2 AND COL2 = 'DOG')
WHERE COL1 = 2 AND COL2 IN ('CAT', 'DOG')
Figure 1: Rewriting to eliminate predicates linked by OR
No query mods needed
There are also no-new-index-needed tuning actions that don’t require query modification. Do you have a high-cost query with a predicate that references a column that has many duplicate values and for which the distribution of those values is very non-uniform (for example, a column of a 1-million-row table that has 1,000 distinct values, one of which appears in 900,000 of the 1 million rows)?
Let DB2 know about that uneven distribution of duplicate values by gathering value-frequency distribution and/or histogram statistics for the column via the RUNSTATS utility. Absent these statistics, DB2 assumes a uniform spread of duplicate column values across a table’s rows, so in the aforementioned example it would be assumed that each of the 1,000 distinct column values appears in 1,000 of the table’s 1 million rows. For DB2 for LUW, column value distribution statistics are collected via the WITH DISTRIBUTION clause of RUNSTATS. For DB2 for z/OS, value-frequency stats are generated using the FREQVAL option of RUNSTATS, and histogram statistics— available on the mainframe platform starting with DB2 9—are obtained via the HISTOGRAM option (histogram statistics have been available on DB2 for LUW systems for quite some time).
Once the catalog has been populated with value-frequency and/or histogram statistics, rebind the program (or just submit the query again, if it’s dynamic SQL), and you may see a big-time performance improvement. By the way, you can get recommendations on enriching catalog statistics to potentially improve the performance of a query by running the query through the Statistics Advisor component of IBM’s Data Studio tool.
The right tool for the job
When the task at hand is query performance tuning, of course you should keep the new-index option in mind. Just remember that it’s a high-value card that should be played at the right time. And don’t forget the other techniques available to you. Indexes aren’t the only tool in the query-tuning shed, pardner.