Does REORG Matter?
How much disorganization can DB2 for z/OS databases stand?
Disorganization permeates databases, and the consequences can be unpleasant and expensive. Disorganized data takes longer to retrieve than organized data, and disorganized indexes slow searches to a crawl. To any enterprise running DB2 for z/OS, disorganization has always been a dangerous, resource-sapping enemy to be fiercely battled with good schema design and regular use of the REORG utility.
However, the arrival of solid-state drives (SSDs) kicked off a debate about the true cost of disorganization. Some argue that the performance and time costs of retrieving disorganized data are artifacts of the way that hard disk drive (HDD) technology works. With SSDs as a primary storage medium, using REORG to organize or cluster data might seem a waste of scarce system resources. Furthermore, REORG makes it harder to exploit storage tiering solutions such as IBM System Storage Easy Tier, which are designed to make optimal use of a minimal amount of SSD storage.
One thing that everyone agrees on: REORG is expensive, and nobody wants to use it more than absolutely necessary. In fact, IBM has made a number of changes to both DB2 10 and IBM System Storage DS8000 software to help you avoid using REORG. But can we just cast it away? Probably not.
Can fast storage save the day?
Clustering physically groups related rows together on the storage medium so that as much of it as possible can be read in a single I/O pass. Inserting new rows erodes the clustering organization (and degrades performance) until it can be reestablished with REORG.
Clustering can improve performance on an HDD, where data bits are stored on spinning platters and retrieved with a moving disk head. But SSDs have no moving parts, and their data retrieval latencies can be an order of magnitude smaller than those of HDDs. Although a DBMS may need more synchronous I/O passes to retrieve unclustered data scattered across an SSD, the perception is that the actual I/Os are so fast that it doesn’t matter.
In practice, there are a couple of holes in this SSD theory. First, even with SSDs minimizing I/O response time, additional synchronous I/Os add CPU costs. Second, the threshold at which I/Os are so fast that the elapsed time “doesn’t matter” can vary dramatically, even between database operations.
Beyond these issues, REORG helps boost performance by keeping indexes organized—disorganized indexes take longer to scan than organized ones, chewing up time and CPU resources. Recently, IBM made some huge strides on both software and hardware technology fronts that further improve performance on disorganized indexes.
DB2 9 uses synchronous I/Os to scan a disorganized index, with I/O throughput on FICON (the IBM Fibre Channel protocol used by System z) and HDDs typically ranging between 1 and 6 MB/sec. By contrast, DB2 10 asynchronously uses list prefetch and makes the prefetch service request blocks (SRBs) eligible for the IBM System z Integrated Information Processor (zIIP), which saves CPU cost. Because they are asynchronous, the I/Os can overlap the CPU time, making the scan faster. IBM testing has shown that DB2 10 can scan a disorganized index as much as 5.6 times faster than DB2 9.
On the hardware side, the R6.2 release of DS8000 storage also helps DB2 handle disorganized indexes. To date, list prefetch over FICON has been notoriously slow. But with R6.2, list prefetch I/Os can use High Performance FICON for System z (zHPF), which cuts channel time by more than 50 percent. List prefetch also gets a boost from a new R6.2 feature called Turbo List Prefetch (TLP), which builds on the DS8000 adaptive caching algorithms to reduce cache hits. Together, zHPF and TLP can more than triple the throughput of list prefetch I/O.
With DB2 10 using TLP, the disorganized index scans can be done at throughputs higher than 40 MB/sec. This throughput is often sufficient to avoid any I/O suspension, which means that the application isn’t waiting for the I/O to complete—the storage unit is no longer the performance bottleneck. The I/O throughput for an organized index scan with the same hardware will be nearly 300 MB/sec with HDD and 400 MB/sec with SSD, but if there is no I/O suspension to begin with, then the faster I/O won’t affect the elapsed time.
Should these fast throughputs change how we think about REORG? If you’re using it on a table space, not necessarily, because that process automatically reorganizes the indexes. But if you’ve been using REORG INDEX to deal with performance problems caused by disorganized indexes, then DB2 10, TLP, and SSD may make your life simpler. That combination can reduce the I/O time to read leaf pages below the CPU time to process those pages, which means that REORG INDEX won’t accelerate the index scan. However, you may still want to use REORG INDEX to clean up pseudo-deleted index entries (see sidebar, “Other problems of disorganized data”) and to reclaim free space.
Other problems of disorganized data
Indirect references and pseudo-deleted RIDs are other types of troublesome data disorganization. Reading a row—even in a table scan—with an indirect reference requires an extra GETPAGE and can add an extra synchronous I/O. List prefetch cannot resolve the issue of indirect references because it uses the list of RIDs from the index, which contains only the original RID locations. It does not know that the row has been relocated until the data row is accessed. SSD mitigates the effect of indirect references, but REORG eliminates them.
Pseudo-deleted RIDs appear when DB2 deletes a row. The index entry is marked as pseudo-deleted, but not physically deleted. Updates also cause pseudo-deleted RIDs because an index update is actually a delete followed by insert.
For the most part, pseudo-deleted RIDs stay in the index until a REORG removes them. In the meantime, they take up storage space and increase the CPU cost of index scans.
DB2 10 helps to address pseudo-deleted RIDs but not indirect references. You will need a different strategy to avoid using REORG to solve these problems.
Now, let’s look at how REORG affects I/O to the data itself when an index is used to determine which data rows are read. The cluster ratio of the index indicates the physical order of the rows with respect to the collating sequence of the index keys. If the cluster ratio is high, DB2 will use dynamic prefetch; if the ratio is low, DB2 will use list prefetch. In some cases, REORG will increase the cluster ratio of an index, improving performance by enabling the optimizer to use list prefetch. But unless the index in question is the cluster index (of which there is only one per table), there is no guarantee that REORG will increase the cluster ratio.
Even if the cluster ratio of an index is high, the data it refers to may still be spread across the table. This will make the GETPAGEs skip sequential and diminish or eliminate the effectiveness of dynamic prefetch. If the qualified pages are sparse, then the query will do synchronous I/O, which will be slower than if the query had used list prefetch. If the GETPAGEs are mostly within 128 KB of each other, then the query will do sequential I/O, but it will collect a lot of (useless) intermediate pages. (Note: Reading extra pages can accelerate I/O in some cases, but when using TLP it’s always better to read only the pages needed for the GETPAGE.)
If you’re using SSD, the distance between the GETPAGEs may not matter. When considering the value of REORG, however, that’s something of a moot point, because REORG doesn’t always affect the distance of the pages anyway. On the other hand, if REORG can reduce the number of GETPAGEs done by the query, then REORG has value no matter the device type.
In the past, using list prefetch was sometimes a cause for concern, as it forced DB2 to build sorted record ID (RID) lists and store them in the RID pool. Once the RID pool was exhausted, DB2 would fall back to an incredibly time-consuming table scan (many a query suffered “death by table scan”).
DB2 10 helps solve this problem by overflowing the RID pool into a work file. Work files are stored in a DB2 buffer pool, which may overflow to a direct access storage device (DASD). A RID pool may also overflow to DASD—a process known as paging—but buffer pool I/O is much faster than paging, and it is generally much faster than the concurrent list prefetch I/O to the data because each 32 K RID block contains more than 6000 RIDs.
The optimizer is unlikely to choose list prefetch if the cluster ratio is very high. If you want list prefetch to be used, wait until the data is disorganized before you run RUNSTATS.
The bottom line
In the end, there are ways to reduce the number of times that you must use REORG, but you probably won’t be able to avoid it completely. When you do need to use it, don’t forget to use DSNACCOX to determine eligibility for REORG scheduling. The default parameters are a guide that you can alter based on your implementation of SSD and/or TLP.
We can’t completely eliminate disorganization in our databases, but at least there are tools that make disorganization less disruptive and easier to manage.