DB2 Server for VSE: System Administration


Recovery Procedures

A system failure is any failure that causes the database manager to end abnormally. Such failures could occur because of an abnormal end of the VSE system, or because of error conditions in the database partition.

As long as the current log is available, recovery from system failures is automatic. Even if you are running the database manager in single user mode (SYSMODE=S) with no logging (LOGMODE=N), it can recover any committed updates by using the current log. Restart recovery is performed the next time the application server is started.

If there is a system failure while you are restoring the database, see Restarting from Failure of a Database Restore.

Restarting Procedures

To perform restart recovery procedures, the operator starts the application server with STARTUP set to one of the following values:

W
Warm start

R
Restoring from a database manager archive

F
Restoring from a database manager archive without reformatting the database data sets

U
Restoring from a user archive

S
Adding dbspaces

E
Adding or deleting dbextents

I
Reorganizing the catalog indexes

M
Catalog migration.

For all these settings, the log is checked at startup to see whether the last run of the database manager left any LUWs in progress. If it did, restart recovery processing starts and the changes made by those LUWs are backed out. Restart recovery processing also ensures that changes made by completed LUWs are, in fact, made.

Restart recovery procedures will not be performed if STARTUP is set to either C (for database generation) or L (for log reformatting or reconfiguration, called a COLDLOG operation).

For both of these settings, the database manager does not check the log, and the LUW recovery processing does not occur.

Restoring the Database

If an unresolvable I/O error occurs on any of the devices that contain the directory or dbextents, the application server ends abnormally. It may be necessary to replace the damaged volume, redefine the VSAM data sets on the new volume, and then restore the database from the most recent archive tapes.

Selecting the Archive Copy to Use

Locate the last successful archive of the database. If the DASD failure occurred while the most recent archive was being taken, then the last successful database archive would be the previous archive copy, not the copy interrupted by the failure.

If you are restoring from the most recent archive and the log dataset (or at least one of the log datasets in the case of dual logging) is not damaged, do not perform a COLDLOG before restoring. The current log is required for recovery. After restoring the database, follow the procedures in Log Reconfiguration to recover the damaged log dataset in the case of dual logging.

If you are using a back-level database archive and LOGMODE had not been set to L when that archive was taken, or if the physical extents of the log have been changed (regardless of what LOGMODE was set to), you must run a COLDLOG with LOGMODE=Y before restoring in order to reformat the logs. Do not use LOGMODE=N.

You may have to redefine the directory and log datasets (or both logs in the case of dual logging) at the same time due to an I/O error. If you are restoring from a user archive, perform a COLDLOG to reformat the logs before continuing with the restore. If there is a problem with both the directory and the log, the database will have to be restored before doing the COLDLOG whether it is a DB2 Server for VSE archive or a user archive. The restore will fail when the database manager tries to read the log. After the restore fails, do a COLDLOG to reformat the logs.

If you are restoring the database by using a database archive and subsequent log archives (LOGMODE=L), locate all the necessary log archives. If the failure occurred during the archiving of the log, do not use that final log archive tape. The database manager will automatically take another log archive when it is started for the restore.

The steps to be followed to restore your database differ, depending on whether the database had been archived using database manager or user facilities.

Restoring from a Database Manager Archive

Start the application server, with STARTUP=R and LOGMODE=A or L to restore the database using an archive created with database manager facilities. The database manager prompts the operator to mount the database archive tape, and to specify on which unit (cuu) the tape is mounted. It then dynamically assigns and opens the tape, and restores the database directory and dbextents from it.

Figure 71 shows an example of doing a startup to restore a database that had been archived using database manager facilities supplied with this product.

Figure 71. Starting with STARTUP=R to Restore a Database

// JOB RESTORE
// EXEC PROC=DBNAME01
// EXEC PROC=ARIS71PL
// TLBL ARIARCH, ...
// TLBL ARILARC, ...
// EXEC ARISQLDS,SIZE=AUTO,PARM='STARTUP=R,LOGMODE=L'
/*
/&
Note:The ARIARCH and ARILARC TLBL job control statements are not required if you have included them in your cataloged procedure for the database (DBNAME01).

It is recommended that you do not specify a VOLID parameter on TLBL statements for log archiving. Multiple log archive files can be created on a single run of the database manager. You would want these files to have different VOLIDs.

In this example, LOGMODE is set to L because the user normally uses log archiving.

Restoring from a User Archive

Shut down the application server, and restore the database using the same user facilities that created the archive.

Do not restore the database logs. If you accidentally restore the logs, the history area and all the changes to the database recorded in the log, are lost. The database manager uses the history area to track which log archives go with which database archives. For more information, see History Area. Even if you have been using log archiving, all changes made since the last database archive are lost. Because the history area is lost, no existing log archive can be used. To recover from accidentally restoring the log, start the application server with STARTUP=L, to do a COLDLOG to reconfigure the logs before proceeding.

After restoring the database directory and dbextents, start the application server with STARTUP=U and LOGMODE=A or L. The operator is asked whether the user restore completed successfully. If the answer is yes, then if LOGMODE=A, the changes in the log are applied to the database; if LOGMODE=L, the database manager takes an archive of the current log, and then restores the log archive tape files that are associated with the user archives. If the operator responds that the user restore was not done, the application server ends, and the operator must take the necessary action to resolve the problem.

Figure 72 shows an example of the control statements needed to perform a user restore using the VSE/VSAM Restore feature.

Figure 72. Example of VSE/VSAM RESTORE Command for a User Archive

// JOB USERREST
// ASSGN SYS004,181
// DLBL IJSYSUC,'SQL301C',,VSAM
// EXEC IDCAMS,SIZE=AUTO
 RESTORE OBJECTS ((SQL301.BDISK) -
        (SQL301.DDSK1) -
        (SQL301.DDSK2) -
        (SQL301.DDSK3) -
        (SQL301.DDSK4) -
        (SQL301.DDSK5) -
        (SQL301.DDSK6) -
        (SQL301.DDSK7))
/*
/&

In this example, VSE/VSAM restores a database having a directory (BDISK) and seven dbextents (DDSK1-DDSK7). For information about the VSE/VSAM RESTORE command, see the Using VSE/VSAM Commands and Macros manual.

When to Use LOGMODE=A

For both database restores (STARTUP=R) and user restores (STARTUP=U), specify LOGMODE=A when you start the application server to have the database manager restore the database without using log archive tape files. When the database is restored, the database manager applies only the changes in the current log to the database. (This is the reason you need to do a COLDLOG if you are not using the most recent database archive, or if you accidentally restored the logs during a user restore: the log does not apply to the older archive.) After completing the restore, the database manager runs with LOGMODE=A.

The database manager still checks whether there are any log archives associated with the database archive. If there are, message ARI0247D is displayed prompting the operator either to keep LOGMODE=A and restore the database without using the log archives, or to switch to LOGMODE=L and use the log archives during the restore. If the decision is made to switch to LOGMODE=L, the database manager runs as if it had been intended to do the restore with LOGMODE=L all along.

When the restore set is complete, the archive that is restored becomes the database archive for the current restore set. A restore set consists of a database archive and the log archives associated with it in the history area -- that is, those log archives that occurred between the database archive and the next restore or COLDLOG or change of log mode.

When to Use LOGMODE=L

Specify LOGMODE=L if you want the database to be restored using log archives. The database manager first restores the database archive and then takes a log archive if information is in the log that was being used when the system failed or was shut down immediately prior to the restore. It then restores the log archives that were taken after the database archive you restored. When the restore is complete, the database manager runs with LOGMODE=L.

Before restoring the database archive and each log archive, the operator is prompted to continue, stop the application server, or end the restore. Usually, the operator responds CONTINUE.

If the operator responds STOP SYSTEM, the application server ends. The next time the application server is warm-started, it will continue restoring the database using the next log archive. If it is restarted to do a restore instead of a warm start, it ignores the first restore, which was stopped, and begins a new one. If it is restarted with STARTUP=C, the application server does the equivalent of an END RESTORE (see below) and then a COLDLOG. (All subsequent log archives are no longer usable.)

The STOP SYSTEM response is used primarily for filtered log recovery. This allows you to stop the application server in the middle of a restore, change the EXTEND input file commands used for filtered log recovery, and continue the restore. For information about filtered log recovery, see the discussion on starting the application server to recover from a DBSS error in the DB2 Server for VSE & VM Diagnosis Guide and Reference manual.

The END RESTORE response is used primarily for ending a restore before processing a log archive tape that is unusable. A secondary use is to end a restore before processing a log archive that contains a user error.

Attention: If you end a restore, you may lose the ability to use subsequent log archives on a future restore.

For example, suppose you have taken a database archive and six subsequent log archives. If you discover a user error that was recorded in the fourth log archive, restore the database archive and the first three log archives. Enter END RESTORE to avoid processing the fourth, fifth, and sixth log archives. When you end the restore, it may be impossible to restore the database again using the fourth, fifth, and sixth log archives. This would be unfortunate if you had made a mistake and, in fact, should have restored the fourth log archive as well. Thus, before you respond END RESTORE, be sure you have processed the correct number of log archives.

If a situation like the one above occurs, the only way to recover the lost log archives is to restore a back-level database archive. The log archives associated with that database archive must include the ones that were lost. That is, the old database archive must have continuous log archives to the point of the END RESTORE. If it does not, you cannot recover the lost logs. For more information, see How the History Area is Used.

After the restore set is complete, the database archive and log archives that were just restored become the current restore set, unless the restore ended before all log archives in the restore set were applied. As a final step, the current log is restored if it directly followed the restored log archives.

Restarting from Failure of a Database Restore

Three types of errors can cause a failure of a database restore operation:

  1. System failures, such as power interruptions, or operator or equipment errors that can be corrected. For example, the database manager can end because the wrong tape volume was mounted or a tape drive malfunctioned.

    In these error situations, after taking corrective action, you can restart the restore process as follows:

  2. A log archive error that can be corrected, or a failure during UNDO/REDO processing.

    To deal with a log error that can be bypassed or corrected, refer to the section on recovering from DBSS errors in the DB2 Server for VSE & VM Diagnosis Guide and Reference manual, especially the discussions on UNDO and REDO processing failures during a restore.

  3. A database or log archive input file error that cannot be corrected, such as a damaged archive tape volume. One of the following situations applies:

Restarting from a System Failure While Archiving

The procedure to recover from a system failure that occurs when the database manager is taking either a log or database archive is essentially the same as any other restart. Because it did not finish, however, the archive that was being written at the time of the failure cannot be used.

Restart the application server with STARTUP=W. If LOGMODE had been set to A or L, specify the same value; if LOGMODE had been set to Y, specify LOGMODE=A.

If the archive in-process had been an automatic archive (started by ARCHPCT), another automatic archive will be initiated immediately when the application server is started again. If it had been started by an ARCHIVE, LARCHIVE, SQLEND ARCHIVE, or SQLEND LARCHIVE command, you must reissue the command when restarting the application server. If it had been an implicit log archive created by issuing SQLEND UARCHIVE with LOGMODE set to L, reissue the SQLEND UARCHIVE command after restarting the application server with LOGMODE=L.

Restarting from Failure of a Database Generation or COLDLOG Operation

If a system failure occurs during database generation or COLDLOG processing, restart the operation after determining and correcting the cause of the failure.

In some cases, storage may need to be reclaimed before continuing processing. For example, an LUW is processing a DROP TABLE statement, a checkpoint is taken during this processing, and a COLDLOG operation immediately follows. If a media failure occurred before the COLDLOG, there is a possibility of rows from the dropped table still existing. However, the entry in the SYSTEM.SYSDROP catalog table no longer exists. To reclaim this storage, the dbspace containing this "dropped" table must be dropped before continuing processing.

Relocating the Database Manager

You can move the database manager between system DASD in two ways:

Replacing a Dbextent

You may want to replace a dbextent because:

To move, replace, or change a dbextent:

  1. Take a database manager archive or a user archive of the database. (See Archiving Procedures.) The archive is required for the steps below.
  2. Define the new data sets for the directory and dbextents on the new device type. Be careful when calculating their size. They should be slightly larger than the original ones, because of rounding that occurs in the space allocation algorithms. If you define the new datasets approximately equivalent in size to the old ones, the restore will probably fail.
  3. Restore the database from the archive you took in Step 1. (See Restoring the Database.)

Replacing a Log

If you are relocating the log data sets to another device because of disk migration or to control device utilization, and the target log data set is the identical device type and size as the source log data set and the source log data set is not damaged, you can use VSE/VSAM BACKUP and RESTORE to move the log data set. See Moving the Log for more information.

This section describes how to replace a log data set (DLBL LOGDSK1 or LOGDSK2). You would replace a log data set if:

  1. The data set is damaged by an unrecoverable DASD error.
  2. You want to change the size of your logs.
  3. You want to move your data sets to a different device type.

To replace log data sets:

  1. If you are replacing the only log (for single logging) or both logs (for dual logging), take a database archive if you are running with LOGMODE=A or L, because the contents of the log, including the history area, will be lost. If you are dual logging and you are only replacing one log, the archive is not lost.
  2. If you are replacing the only log (for single logging) or both logs (for dual logging), follow the procedures on Log Reconfiguration.
  3. If dual logging and you are only replacing one log, use the IDCAMS command to delete and redefine the VSAM data set for the log to be replaced. For a description of the job control statements, see Figure 87.

Recovering to a Secondary System

To be able to recover in cases where the original database data sets are not available (for example, in an offsite disaster recovery situation), you should make a VSE/VSAM BACKUP copy of the log data set after every log archive or database archive. You would then recover to a secondary system. The secondary system must have the same dbextent configuration and number of logs as the original system.

If you have been running with LOGMODE=A and need to recover to a secondary system, do a log reconfiguration to initialize the log (see Log Reconfiguration), then restore the most recent archive on the secondary system.

If you have been running with LOGMODE=L and need to recover to a secondary system:

  1. Do a log reconfiguration to initialize the log ( Log Reconfiguration)
  2. Use VSE/VSAM RESTORE to restore the copy of the log data set that you took after the latest database or log archive of the original system. Restore it onto the secondary system.
  3. Restore the most recent archive on the secondary system.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]