Blogs

Reducing Conversations with DB2 for z/OS: Part 2

More tried-and-true basics and a few new bells and whistles

In my previous column, I wrote about unnecessary SQL. The goal was to make programmers stop and think about each statement and make sure that it was both absolutely necessary and absolutely necessary at that specific time. I gave examples of SQL that could be eliminated completely and SQL that could be deferred until a later time, until the last possible moment, and then executed only if still necessary.

Tried-and-true techniques

In this column, I will continue with examples of SQL that may be executed far too many times and some that should be replaced with newer, better-performing SQL. So, welcome to part two of a three-part series on eliminating or reducing conversations with IBM DB2.

Control break logic

One of the fundamental rules for performance that we should obey when writing programs, especially batch programs, is to use control break logic whenever possible. This is not exclusively a DB2 concept—it’s just good programming practice. Checking for a control break, or a change in a value, reduces connections to DB2 because you look up a value in a table only if the value you are worried about is different from the last value that you looked up.

Optimal control break logic requires that the input data be in a specific order; for example, ITEM within STORE within REGION. But even if the input data is not sorted in a specific order, break logic can still be used, ensuring that any SQL that is required for a REGION is performed only once for each REGION; for a STORE, only once for each STORE within that REGION; and for an ITEM, only once for each ITEM within that STORE.

As a side note, DB2-enforced referential integrity does not use control break logic. Foreign-key values are checked on every INSERT, UPDATE, DELETE, and MERGE, even if the value of the column at issue is the same as the value in the row INSERTed (and so forth) before. With program-enforced referential integrity, we have the option of doing the validation only when the value of the column at issue is different from the value in the prior row; that is, on a control break.

Preloading small reference tables into program memory

Another fundamental rule for performance is to preload small reference tables into working storage tables (when it is smart to do so) to avoid connecting to DB2 an inordinate number of times.

A less-than-amiable person once asked me, “What kind of crummy product makes it necessary to load values into working storage to avoid connecting to the product?” I thought for a minute and said, “The kind of product that recognizes that a programmer has choices and a brain and understands that avoiding unnecessary calls, connects, and get pages is a good idea.”

Just because DB2 is fast and has buffer pools to reduce actual I/O does not mean that connecting to DB2 to read a 10-row table a million times is smarter than connecting to DB2 once, reading 10 rows into a working storage table, and then addressing program memory a million times.

Reduce repetitive executions of subselects

Consider this SQL:

Select col1, col2, col3
From big_table where item = :hv-item-just-read
And big_table.deptno in (select deptno from little_dept-table
Where division = Eastern)

For each item on an input sequential file, the program logic requires a lookup in a big table to get the associated values for col1, col2, and col3. Hopefully, as recommended earlier, the input data set has been sorted by item number and the program is checking to see if this item number is different from the prior number before this lookup is done.

But look more closely at the SQL. The subselect is creating a list of departments that are in the Eastern division and then making sure that the department at issue is in that list. I’m sure that, with a wee bit of thought, we could create that “departments in the Eastern division” list once and then do our lookup without connecting to DB2. For example, we can use a V8 rowset-positioned cursor to connect to DB2 one time and fetch our list; then we change the subject SQL to a hardcoded in-list built from our array:

Declare CursorDept with rowset positioning for
Select deptno from little_dept-table
Where division = Eastern)
Fetch next rowset from CursorDept
For 20 rows
Into :hvarray-deptno
(code to ensure that +100 was reached)
Select col1, col2, col3
From big_table where item = :hv-item-just-read
And big_table.deptno in (:hvdept1, :hvdept2, :hvdept3….:hvdept20)

Remember, if there are fewer than 20 department numbers (say, 15) and you do not want to code dynamic SQL, you can always perpetuate the last value in host variables 16 through 20.

Reading a row before update/delete—fetch/update/fetch/update

More-experienced programmers must overcome some of their entrenched practices. In the “old days,” we had to read a record to update or delete it. With DB2, if you have no need to establish a before image of the row, you do not need to read the row to update or delete it. I often see batch programs that use a cursor to fetch, update, fetch, update, fetch, update…, when they could just do a reasonably sized relational SET UPDATE using the same WHERE clause that is in the DECLARE CURSOR. Instead of:

Declare CursorUpd for
Select col1, col2, …
From tableA
Where jobcode = :hvj for update of salary
Fetch CursorUpd into :hvcol1, :hvcol2, …
Update salary set salary = salary + 1000.00 where current of CursorUpd

and repeating the fetch, update, fetch, update again and again, why not just:

Update tableA
Set salary = salary + 1000.00
Where jobcode = :hvj

SET processing is one of the huge strengths of relational design. You must, of course, ensure that the SET is a size that does not create an unacceptable unit of recovery or a problem with lock escalation, or cause locking issues (such as timeouts) for other concurrently running programs.

I sometimes see the preceding scenario (fetch, update) compounded by a subsequent singleton SELECT of the updated row—read it before, update it, look at it after. With newer programming techniques, as well as the realization that we do not need to see the before image, we can DECLARE a row-positioned cursor to SELECT from our SET UPDATE and then FETCH a rowset of reasonable size (say 100 rows at a time) to see the result of the maintenance. Single-row fetches are becoming rare. Why connect 100 times when you can connect once and see the same 100 rows?

Stay tuned for part 3

In the next issue, I will continue with the final installment of this topic, and we will look at even more (and newer) techniques for avoiding or reducing connects to DB2.

Resources