Four Keys to Reducing High Transaction Costs: Part 2
How a top US retailer supported 250,000 SQL queries/second with DB2 for Linux, UNIX and Windows on Black Friday 2011
The first part of this article highlights some important tenets to follow for reducing the potentially high costs of online transactions. For example, retailers must execute within their means; focus on the costs of execution inside IBM® DB2® for Linux, UNIX and Windows; work to find problems; and then avoid hiding problems behind huge bufferpools.
Part 1 also examined the first of four areas that can contribute to those costs—hot spots. This installment explores the remaining three areas: pain points, troublemakers, and double trouble. By using this methodology to identify problems, a prominent US retailer achieved exceptional—and cost-effective—performance on Black Friday, traditionally one of the busiest retail days of the year.
To find pain points, look for the highest-cost SQL statements, in the consolidated aggregate, that drive read I/O to hot spot tables. You can find those SQL statements by analyzing the SQL workload that is contributing I/O to problem tables. The ideal SQL analysis considers total SQL costs of execution for statement patterns independent of literal values, and then puts those costs into relative perspective by comparing percentages. Unless you’re using a commercially available tool that isolates and performs SQL statement consolidation and cost aggregation, this task will be tedious and time-consuming.
Without tools, try running a SQL snapshot query against SYSIBMADM.SNAPDYN_SQL that divides POOL_DATA_L_READS by NUM_EXECUTIONS and provides the average logical reads per execution. Filter the statement text with a LIKE predicate that will look for the problem table names, and then order the results in descending sequence according to the average logical reads. This isn’t perfect, but now you should be seeing the most costly and harmful SQL statements affecting your database’s performance. With this SQL information, you can EXPLAIN it and determine new or updated indexes to mitigate the I/O costs.
Troublemakers are the tables that have endured the highest write I/O. To identify the defined indexes on each table and determine whether any have low cardinalities, begin by running a query against SYSIBMADM.SNAPTAB that selects TABNAME and ROWS_WRITTEN for each table. Order the report in descending sequence by ROWS_WRITTEN, limiting the output to the first 10 or 20 rows. These are the most highly written-to tables. And as a guideline, the full key cardinality of any index on these top tables should be at least 10 percent of each table’s cardinality. If an index has a full key cardinality that is less than 10 percent of the table’s cardinality, then either drop the index or add columns to the end of the index, inflating the cardinality. Indexes with low full key cardinalities have very long record identifier (RID) lists that must be maintained—which is not only costly in terms of CPU cycles, but puts your application at greater risk of locking problems.
“Double trouble” refers to tables that need to be reorganized because they suffer from excessive overflow accesses. When a row is updated and the length of a VARCHAR column is increased, it may no longer fit on its original page. In this case, DB2 writes the row to another nearby page (if possible) and leaves a pointer on the original page.
What happens when DB2 needs to read the updated row? An index leaf page directs DB2 to the original data page, where DB2 will find the pointer to the new page and also must access an additional page. When read overflows occur, the database performs double the logical I/O and possibly double the physical I/O (if the pages are not in the bufferpool).
If you’re trying to meet and exceed revenue goals in your e-commerce store during the busiest shopping days of the year, you can’t afford the CPU and I/O cost of double trouble. When (OVERFLOW_ACCESSES * 100 / ROWS_READ) for any table exceeds 3 percent, and if the table represents more than 2 percent of all database rows being read, then you should reorganize the table to eliminate the double I/O operations. As for the REORGCHK utility, you can stop running that now because this metric, based on real-time performance measurements, will guide you to tables requiring reorganization without the risks of updating catalog statistics or utility overhead.
By following the prescribed methodology that focuses on lowering the highest execution costs, this top US retailer cut average lock times in half, reduced server CPU utilization from 70 percent to 45 percent, reduced logical reads by 11 percent, dropped average transaction response times from 300 ms to 200 ms (33 percent faster!), and achieved recognition as one of the three fastest e-commerce websites in the world on Black Friday 2011.
Have you been successful identifying problem spots that contribute to costs? How did you do it? Please share the experience in the comments.