Database Backups for Peace of Mind: Part 2
Organizations deploying DB2 environments should have recovery processes in place when disaster strikes
Part 1 of this series offers a detailed look at the various options for copying data in IBM® DB2® database deployments to back up the data as an insurance policy against permanent loss or extended downtime. This concluding installment focuses on the details to consider when recovering data in DB2 environments, depending on how the data was backed up and where the backup data resides.
Timing variations for recovery of tables
Copying and recovering one object such as a table can be relatively easy. Merely copying the table space pages is all there is to it. The elapsed time needed to complete the task depends on the copy target and the table size:
- Copying the object onto a physical tape requires waiting for the tape drive to be available and waiting for the tape to be mounted, either by a tape robot or a human operator. Only then can DB2 proceed with the copy process through a DB2 buffer pool.
- When copying the object onto a direct-access storage device (DASD), the target file needs to be allocated first, which usually takes about a second. The copy process then proceeds through a DB2 buffer pool.
- When copying the object to a virtual tape system (VTS), there is a wait time for mounting the logical tape, which usually takes about a second. The copy process then proceeds through a DB2 buffer pool.
- When copying the object using the IBM® FlashCopy® method, data facility data set services (DFDSS) allocates the target file, which usually takes about a second. The copy process then proceeds without using a DB2 buffer pool.
Recovery using each method is nearly as easy as the copy process. Each essentially is a reverse process of these copy methods, with the same wait times per process. However, differences may occur when handling indices, which need to be rebuilt from the recovered data—and doing so takes time.
The duration of recovery depends on the size of the table, the number of indices, the amount of parallelism that is possible, and the availability of space for sorting. If there is a shortage of either sorting space or parallelism options, then index rebuilding is serialized and total recovery time can be dramatically longer than anticipated. For example, recovering 100 tables, one after the other, from tape takes time because each table must be recovered before proceeding to the next table. A similar delay can occur when using VTS for recovery when backup files are stacked on one virtual tape. The only way to expedite recovery time rapidly is to use parallelism provided by either a DASD or many virtual tapes on a VTS.
Now, consider a more complex scenario than backing up a single object, such as backing up 100 tables, each in its own table space. Copying the tables to a physical tape requires stacking them one after the other on the same tape, which economizes on tape but is costly in terms of the elapsed time. If they are copied to a VTS, they can either be stacked on a virtual tape or spread over many virtual tapes in parallel up to the parallelism limits of the VTS. If they are copied to a DASD, they can be stored either serially to different target files or in parallel up to a self-imposed limit or a DB2-imposed limit.
Using parallelism certainly helps to reduce elapsed recovery time, but there is a price to pay when using VTS because many virtual tapes are consumed. Using the PARALLEL n parameter, where n is the parallelism amount, can control the amount of parallelism used. Keep in mind that DB2 may change this number if it is short on resources at run time.
For example, consider a use case in which a DROP DATABASE command issued in the wrong environment required approximately 240 tables to be recovered from one physical tape. That process, which included rebuilding indices from the recovered tables, took about 10 hours. The recovery process could have been substantially reduced if the backup had been made to a DASD instead of physical tape. This alternative was put to a test in a similar recovery scenario from a DASD using PARALLEL 10. Recovery from a DASD took approximately 15 minutes. In another test using the PARALLEL 0 parameter, DB2 selected PARALLEL 49, but the elapsed time for recovering the 240 tables was about the same as using PARALLEL 10. The FlashCopy method was not used in this case because it was not available.
As mentioned in the first installment of this article, large objects that are sporadically updated can benefit from incremental copies if the TRACKMOD YES parameter is used for the table space. This parameter helps DB2 find the changed pages quickly by using the information within SPACEMAP pages; without it, DB2 uses a table space scan to look for changed pages.
When using incremental copy, a full copy must be available as a base. Then incremental copies can be made. When recovering, DB2 starts from the base full copy and adds the incremental copies. An incremental copy can be combined with a base copy to produce a new base copy or combine incremental copies into one incremental copy.
Capitalizing on rapid DB2 recovery
DB2 Version 10 provides an interesting new utility called RECOVER with the BACKOUT option. This feature allows a kind of rollback recovery in which DB2 reads the log backward to undo the operations. This method relies on log-only activity, so non-logged table spaces cannot be recovered using this method. Moreover, the indices must be defined with COPY YES for the RECOVER with the BACKOUT option process to handle them; otherwise, they require rebuilding.
Based on experience gained from many unexpected scenarios, having enough active log space is paramount for fast and easy data recovery. A good rule of thumb is to ensure there are 24 hours of data on active logs, and another 24 hours of archive logs that are easily available on either a DASD or VTS. When using a VTS, never stack files that can be written in parallel or they may need to be read in parallel.
Using the FlashCopy method is preferable, particularly when using the FLASHCOPY CONSISTENT option. If the FlashCopy method is not an option, then the next best course is to copy the data to a DASD in parallel. Copying to a VTS in parallel is also a good approach, but care must be taken with the number of channel paths available to the VTS and the number of virtual tapes that are consumed.
For disaster recovery plans that store backup data at a remote site, duplex the backup files at the remote site. Use incremental image copy for very large files that are sporadically changed, but try to ensure a full copy is created often enough or use the MERGECOPY utility to consolidate incremental images with the base full images.
Advancing database backup and recovery
In short, using parallelism is generally preferable over using serialized approaches. I/O is the heaviest process for the COPY and RECOVER utilities. Avoid serialization, and avoid building processes that work serially. Look at job DSNTIJIC from the SDSNSAMP library; it uses tape with stacked files and serves as a good example of how not to do a data backup. Use DASD as a backup target, and always have at least the last backup generation available for recovery on DASD, along with the appropriate DB2 active and archive logs. Using VTS is the next-best option when disaster strikes. But do not wait for disaster; have the entire recover process up and running and practice it.
Please share any thoughts or questions in the comments.