Blogs

Utilities for Developers

Harness the magic of REORG and LOAD

If you’re a programmer and you want to manipulate data, you write a program. But in some cases, that may not be the best option. DB2 has several utilities that can save you time—and using them instead of program code can help you improve system availability and performance. Running utilities is usually the responsibility of a DBA, but let’s look at the many tasks that utilities can accomplish for you and the many application problems that they can solve.

Why utilities instead of maintenance programs?

Maintenance programs must be coded, tested, and then run. They acquire locks and hopefully do COMMITs to release those locks. However, if set up correctly, utilities do not need to be coded, and testing is minimal; they basically just need to be run. They acquire very few or no locks and are finely tuned to use as little CPU overhead as possible. Plus, utilities often can be run simultaneously with programs—meaning no outages for your users.

Purging data using REORG

Have you ever written a program to purge (DELETE) data from a table? Such a program will often run for hours, causing locking issues (time-outs) for other users. And at the end of the job, even though the table has fewer rows, it still takes up the same disk allocation and has just as many pages—only now the pages are littered with random chunks of free space. The program has eliminated rows but a REORG must be done to unload and reload the rows and consolidate them onto fewer pages, cleaning up the mess the program left behind.

LOAD instead of UPDATE?

Although LOAD and UPDATE are not identical, they do have a symbiotic relationship. For example, when a huge percentage of the rows in a large table must be updated, a maintenance program is often the solution. But we know that rows expand, contract, and move to different pages any time there are extensive column data changes. Further, if the clustering columns have been updated, the rows may no longer be in cluster order. Nine times out of ten, a REORG is scheduled after a huge maintenance program is run. Remember, REORG is basically an unload and a reload of data.

We can use the LOAD utility to accomplish our update goals more efficiently. We read (SELECT) all rows in cluster order into a program using lock-avoidance techniques like ISOLATION(CS) and CURRENTDATA(NO). The program then changes the column data (usually by reading a sorted transaction file in the same order as the table data) and writes both the changed and unchanged rows to a sequential file. The program is essentially unloading the rows and changing the ones that need to be changed in the process. Since the program is only reading the DB2 table, locking issues will be minimal or non-existent. After the program finishes, the sequential file can be sorted if necessary. (This sort might be needed if the column data on which the table is clustered is changed.) The output sequential file can then be used as input to the Although LOAD utility.

Now think about this: we unloaded the rows, changed them, sorted them (possibly), and then reloaded them. In effect, we did our own REORG. Although this is not always a feasible solution, it has many advantages when the conditions are right.

One of the other wonderful benefits of the LOAD utility is that it can act like INSERT but be smarter and more efficient. Both INSERT and LOAD (run with SHRLEVEL CHANGE to allow other users simultaneous access to the table) honor the desire to keep the table rows in a specified cluster order. However, the LOAD utility uses less CPU overhead than INSERT and does intelligent COMMITs with the least-possible contention with other table users.

Other benefits of utilities

Utilities provide creative solutions to many difficult problems. For example, tables can now be set up with an option called APPEND that causes INSERT to place rows at the end of the table, rather than attempt to keep them in cluster order. This is perfect for tables that have a large number of rows added followed by a “clean up the mess” REORG.

But we can also use this strategy to improve slow-running, contentious INSERT programs. You can enhance performance by altering the table to use this APPEND option. Putting the rows at the end of the table is far easier and less costly than attempting to honor your cluster order. CPU usage, GETPAGEs, and read I/Os will be reduced. At a later time, REORG can be used to put the rows back in cluster order.

REORG also helps us implement another performance feature of DB2. Variable character data is historically padded to its maximum length in an index. Today, indexes containing variable character column(s) can be altered so that those columns are no longer padded. Rather, the length of the columns in the index will match the column length in the table. To implement and reap the benefits of the option, the index must be rebuilt. Table-space REORGs rebuild the indexes associated with the table, thereby implementing the unpadding.

Utilities provide options

When examining existing maintenance programs or considering writing new maintenance programs, do stop and consider whether a utility might just take care of the problem. Treat utilities as an option whenever massive maintenance is needed. Think above and beyond the “write yet another maintenance program” solution.