Maintenance Makes Money
Five ways to cut database costs—just by paying attention
Few things do well without proper care. Houses, cars, pets, friends, and family all need our attention to stay happy and healthy, and database management systems (DBMSs) are no different. Without care and attention, a DBMS will break down, slow down, and eventually have to be shut down because it costs too much to operate. Keeping maintenance levels current to take advantage of changes in technology is one thing; applying the correct technology is another. This article points out five ways to reduce the cost of running DB2 by providing proper care to the system, using the right tool for the job at hand, and paying attention to detail.
1. Use the right SQL for the job—and make sure you know what it is
There are many ways to code a query that will yield the desired results; the trick is finding the method that plays to the strengths of your DBMS. That may seem obvious, but it’s not as easy as it sounds because the tools and technology available to you change over time.
A few years ago, one of my clients inherited a customer relationship management (CRM) application that had been ported to its DB2 for z/OS Version 8 subsystem. One program was bringing the company’s new IBM System z9 to its knees, consuming 25 percent of all CPU cycles on a daily basis. The program had 10 CREATED Global Temporary Tables, and most were embedded in recursive SQL statements. Six of the temporary tables could contain between 10,000 and 100,000 rows, depending on the depth of the sales chain, and were accessed repetitively.
But there is a second type of Global Temporary Table on DB2 for z/OS: DECLARED. And although CREATED temporary tables could not use any index technique until DB2 9, DECLARED tables could. So, I recommended converting six CREATED temporary tables to DECLARED temporary tables with clustering indexes. The remaining four CREATED temporary tables were appropriate for the job since they contained a small number of rows and were accessed only once. This change reduced the time required to complete each transaction from 1.78 seconds to 0.07 seconds, which doesn’t look that impressive until you know that this program is executed a minimum of 6,000 times per day. Therefore, the one small change saved the client 2.8 CPU hours every business day.
The results were so dramatic that the senior DBA on site even joked about returning the new z9! All of that occurred three years ago. Today, the client has DB2 9, which automatically generates sparse indexes on repetitively accessed work files, including temporary tables. But this sort of situation occurs all the time. SQL is a moving target: DB2 SQL had very humble beginnings as a query language when it started out in 1983—at that time it supported only inner joins, subqueries, the GROUP BY clause, the HAVING clause, the ORDER BY clause, and approximately 21 built-in functions. Now, 27 years later, the SQL available with DB2 9 for z/OS has exploded.
Look how you’ve grown
When DB2 SQL was first introduced, you could practically count the functions on one hand. These days, there’s a lot more to choose from:
Some SQL features are synonymous, but most provide unique functionality. And it is important to fully exploit new features as they become available, because many can increase performance and cut the cost of running your applications. Searching static and dynamic SQL for keywords is an effective way to gauge how advanced your production SQL portfolio is. Figures 1 and 2 are SQL report cards prepared for clients by searching based on what was running in their current production environments. Client XYZ used very little SQL technology within its 3,000 financial reporting applications. This translates to almost all relational business rules and filtering performed in the application program, instead of taking advantage of high-performance SQL functions that could carry that load.
Client ABC’s report card was much better but still lacked implementation of SELECT INTO with ORDER BY and multirow fetch. These features typically reduce CPU demand by 30 percent and 50 percent, respectively. In thousands of occurrences and millions of executions, Client ABC was using obsolete single-row fetches instead of the current technology. Current SQL coding skills need to be cultivated in today’s programming pool to transfer the technology into the production environment.
Figure 1: Client XYZ’s SQL report card as of DB2 7 for z/OS. Figure 2: Client ABC’s SQL report card as of DB2 8 for z/OS.
By moving to DB2 9, you get access to many new features, including:
- SELECT FROM UPDATE/DELETE/MERGE
- EXCEPT, INTERSECT
- RANK, DENSE_RANK, ROW_NUM
- ORDER BY / FETCH FIRST in Subselect
Not taking advantage of these features, especially in a data warehousing environment, means that you need external tools to provide the functionality desired. This requirement drives up cost in terms of licensing fees and can have a significant impact on performance, since many of these tools require data to be transferred out of DB2 for external processing. (The searches used to gauge your SQL portfolio can also be used to determine whether a soon-to-be-purchased application package uses advanced features.)
2. Create optimal index design
Base table indexes are the first line of defense for performance-related issues. The most popular or critical transactions and services should have indexes to support all of the joining, filtering, and sequencing tasks required. Their omission can result in seriously delayed filtering or joining, all of which requires extra resources.
Consider the following scenario: A company had a Web page that its customers could access, and the company used the common tactic of executing a series of services when the customer logged in and storing the results so they could readily be displayed when the customer clicked on the associated tabs. This popular “P-Search” architecture let the customer quickly access commonly used functions, but such a strategy can be costly if the services being executed aren’t indexed properly.
In this case, the P-Search service was consuming 9.9 CPU seconds of “in DB2” time on IBM System z10; during peak season, the customers waited between 10 and 40 seconds to see the Web page. The P-Search service contained a dynamic query that was responsible for the CPU consumption: a simple, three-table join on TABLEA, TABLEB, and TABLEC. All tables were partitioned and clustered by a single column SEQEN_NR; however, SEQEN_NR was not added to any other index. This forced the optimizer to choose between filtering and joining. Figure 3 shows what happened:
Step 1. TABLEC is the first table accessed. DB2 Optimizer chooses to apply the local filter LAST_NM, but because SEQEN_NR is not included in the chosen index, DB2 Optimizer incurs random I/O to retrieve all SEQEN_NRs needed for the join to table TABLEB. Because the join is not performed using an index structure including LAST_NM and SEQEN_NR, it’s delayed until data pages are retrieved. (See No. 1 in Figure 3.)
Step 2. The second table accessed, TABLEB, suffers from the same index design and applies the local filter ROLE_CD, again incurring random I/O before applying the join predicate. Because the join was not performed using an index structure, it is also delayed until the data pages are retrieved. (See No. 2 in Figure 3.)
Step 3. TABLEA, the last table accessed, also suffers but this time uses the primary/clustering index for the join, uses sequential I/O for the data pages, applies the local filter GUAR_DT, and then sorts for the ORDER BY. (See No. 3 in Figure 3.)
All three tables accessed had filtering or joining delayed until data pages were accessed. This is because DB2 Optimizer had to choose between joining or filtering, as no composite indexes satisfied both.
Solutions for this frequent, business-critical service consisted of the following:
- Altering the LAS T_NM.FIRS T_NM index to add SEQEN _NR and SSN _NR to achieve index-only access to completely eliminate all random I/O to the data pages
- Altering the ROLE _CD index to add SEQEN _NR to combine the filtering and the joining
- Adding a fifth index on TABLEA of GUAR _DT.SEQEN _NR to combine the filtering and joining and eliminate the sort for the ORDER BY
The new index solutions brought the CPU time down to 0.02 seconds by moving all filtering, joining, and sequencing to composite index structures.
Figure 3: The first table accessed, TABLEC, applied the LAST_NM filter on the non-clustering index, delaying SEQEN_NR pickup. The second table accessed, TABLEB, applied the ROLE_CD filter, delaying the join. The third table accessed, TABLEA, applied the join filter on the clustering index, delaying the GUAR_DT filter.
3. Monitor RID pool failures
An expensive access path is one that has many steps that must be completed before a single result row is returned. These access paths usually involve row identifier (RID) sorting to eliminate random I/O and are called List Prefetch, Multiple Index Access, and Hybrid Join – Type N. All of these access paths use the RID pool to pre-sequence data page numbers prior to access. This RID pool resource has limits and, when those limits are exceeded, a table scan occurs.
The more RIDs being poured into the RID pool by any number of programs, the greater the chance of a failure that forces a table scan. One symptom of RID pool failure is unexpected spikes in sequential prefetch counts. Tracking sequential prefetch counts is one way to spot failures. A better way is through an online monitor. Figure 4 shows the data from an online monitor of one program executing in a one-hour period using Hybrid Join –Type N that observed 61 RID pool failures.
One solution to the problem is to increase the size of the RID pool, but that may only temporarily suppress failures; worse, it may have no impact at all. A better solution is to identify queries that rely heavily on the RID pool by using DB2 traces or an online monitor. Each query needs to justify its RID pool use by fetching through to the end of the entire result set. Any queries not passing this test need to turn off heroic access paths that use the RID pool by adding either the OPTIMIZE FOR n ROWS or the FETCH FIRST n ROWS ONLY clause to the end of the query. DB2 10 will provide relief from these failures by utilizing external work files for the overflow, thus masking the need to analyze. However, it is best to find and resolve such issues before migrating to DB2 10, or you will have work file madness.
Figure 4: A program with Hybrid Join – Type N access path gets 61 RID pool failures between 12 p.m. and 1 p.m.
4. Neutralize frozen or poorly performing SQL
Frozen SQL is SQL embedded in a purchased application that cannot be touched. The queries inside frozen SQL sometimes do not follow any performance guidelines and you are stuck with them. You need these queries to execute using as few resources as possible. For example, WHERE DATE(col_TS) BETWEEN :date1 AND :date2 is a Stage 2/Residual condition. This predicate should be rewritten to WHERE col_TS BETWEEN TIMESTAMP(:date1, ‘00:00:00’) AND TIMESTAMP(:date2, ’59:59:99’), which is indexable Stage 1/Sargable. But you are forbidden to make that change.
Getting the work done ahead of time and storing the result on disc is one option. For DB2 8 sites, a solution is to create a materialized query table (MQT) that SELECTS DATE(col_TS) as NEW_COL FROM TABLE and to create an index on NEW_COL column. Now all dynamic queries will be redirected to use the index. Another disc-based solution for DB2 9 sites is to use the feature called Index on Expression—CREATE INDEX on DATE(col_TS). This will yield the same result with only one structure and can be used by both static and dynamic queries.
Both techniques are for queries that are ultra expensive to run, ultra complex to compute, or just plain ultra popular and may not be frozen. If your site is missing either type of structure, you are missing out on CPU demand reductions.
Another solution not yet available is memory based and can accelerate longer-running queries by 5 to 10 times without requiring changes to the application, according to IBM lab and customer beta experiences. Here’s how: Using IBM Data Studio (available at no charge), select a fact table and dimension tables to be preloaded in memory. This will copy and highly compress portions of your data warehouse into an all-memory, network-attached appliance, forming an accelerated query table (AQT). The appliance is called the IBM Smart Analytics Optimizer and is currently in beta. The DB2 9 Query Router used in redirecting queries from base tables/indexes to MQTs/indexes will be able to transparently redirect INNER/LEFT JOIN queries to use AQTs. This memory-based appliance will gain functionality over time to redirect a larger portion of queries for even more mind-bending query speed-up. Scalability requires a smart performance layer; MQTs, Index on Expression, and AQTs make up this performance layer.
5. Create the optimal schema design for your workload pattern
Scalability also requires a smart schema design that efficiently accommodates processing data in a workload pattern. For example, online transaction processing (OLTP) and intranet applications weather best with a third normal-form table design to mitigate update/delete anomalies and when heavily indexed for probing. This normalized design is well suited for static and focused business questions. Online analytical processing (OLAP) and business intelligence (BI) weather best with star schema and denormalization. The fact table and dimension table design are well suited for ad hoc and wide business questions that span vast amounts of data. It makes sense that these environments would have disparate base table designs, index designs, MQTs, Indexes on Expressions, and in the future, AQTs, to take advantage of all DB2 has to offer for the workload pattern (see Figure 5).
Figure 5: Web and OLTP applications need smart index strategies, including Index on Expression, to handle large numbers of short-answer queries for probing and scrolling. OLAP and BI applications benefit from MQTs and AQTs to handle broad business questions that require vast amounts of data.
Show your database you care
Organizations need to be intimate with their DBMS of choice—attending to its proper care and paying attention to detail. To scale well, packaged DB2 applications also need to adhere to performance rules. By following simple rules—such as using the right SQL, indexes, access paths, MQTs, table design, and soon-to-come AQTs for the job—you can reduce the cost of data management. These rules require some thought and time beyond today’s rapid application development cycle. But without this extra effort, your DBMS may start looking worse for wear.