A system failure is any failure that causes the database manager to end abnormally. Such failures could occur because of an abnormal end of the VSE system, or because of error conditions in the database partition.
As long as the current log is available, recovery from system failures is automatic. Even if you are running the database manager in single user mode (SYSMODE=S) with no logging (LOGMODE=N), it can recover any committed updates by using the current log. Restart recovery is performed the next time the application server is started.
If there is a system failure while you are restoring the database, see Restarting from Failure of a Database Restore.
To perform restart recovery procedures, the operator starts the application server with STARTUP set to one of the following values:
For all these settings, the log is checked at startup to see whether the last run of the database manager left any LUWs in progress. If it did, restart recovery processing starts and the changes made by those LUWs are backed out. Restart recovery processing also ensures that changes made by completed LUWs are, in fact, made.
Restart recovery procedures will not be performed if STARTUP is set to either C (for database generation) or L (for log reformatting or reconfiguration, called a COLDLOG operation).
For both of these settings, the database manager does not check the log, and the LUW recovery processing does not occur.
If an unresolvable I/O error occurs on any of the devices that contain the directory or dbextents, the application server ends abnormally. It may be necessary to replace the damaged volume, redefine the VSAM data sets on the new volume, and then restore the database from the most recent archive tapes.
Locate the last successful archive of the database. If the DASD failure occurred while the most recent archive was being taken, then the last successful database archive would be the previous archive copy, not the copy interrupted by the failure.
If you are restoring from the most recent archive and the log dataset (or at least one of the log datasets in the case of dual logging) is not damaged, do not perform a COLDLOG before restoring. The current log is required for recovery. After restoring the database, follow the procedures in Log Reconfiguration to recover the damaged log dataset in the case of dual logging.
If you are using a back-level database archive and LOGMODE had not been set to L when that archive was taken, or if the physical extents of the log have been changed (regardless of what LOGMODE was set to), you must run a COLDLOG with LOGMODE=Y before restoring in order to reformat the logs. Do not use LOGMODE=N.
You may have to redefine the directory and log datasets (or both logs in the case of dual logging) at the same time due to an I/O error. If you are restoring from a user archive, perform a COLDLOG to reformat the logs before continuing with the restore. If there is a problem with both the directory and the log, the database will have to be restored before doing the COLDLOG whether it is a DB2 Server for VSE archive or a user archive. The restore will fail when the database manager tries to read the log. After the restore fails, do a COLDLOG to reformat the logs.
If you are restoring the database by using a database archive and subsequent log archives (LOGMODE=L), locate all the necessary log archives. If the failure occurred during the archiving of the log, do not use that final log archive tape. The database manager will automatically take another log archive when it is started for the restore.
The steps to be followed to restore your database differ, depending on whether the database had been archived using database manager or user facilities.
Start the application server, with STARTUP=R and LOGMODE=A or L to restore the database using an archive created with database manager facilities. The database manager prompts the operator to mount the database archive tape, and to specify on which unit (cuu) the tape is mounted. It then dynamically assigns and opens the tape, and restores the database directory and dbextents from it.
Figure 71 shows an example of doing a startup to restore a database that had been archived using database manager facilities supplied with this product.
Figure 71. Starting with STARTUP=R to Restore a Database
// JOB RESTORE // EXEC PROC=DBNAME01 // EXEC PROC=ARIS71PL // TLBL ARIARCH, ... // TLBL ARILARC, ... // EXEC ARISQLDS,SIZE=AUTO,PARM='STARTUP=R,LOGMODE=L' /* /& |
Note: | The ARIARCH and ARILARC TLBL job control statements are not required if you have included them in your cataloged procedure for the database (DBNAME01). |
It is recommended that you do not specify a VOLID parameter on TLBL statements for log archiving. Multiple log archive files can be created on a single run of the database manager. You would want these files to have different VOLIDs.
In this example, LOGMODE is set to L because the user normally uses log archiving.
Shut down the application server, and restore the database using the same user facilities that created the archive.
Do not restore the database logs. If you accidentally restore the logs, the history area and all the changes to the database recorded in the log, are lost. The database manager uses the history area to track which log archives go with which database archives. For more information, see History Area. Even if you have been using log archiving, all changes made since the last database archive are lost. Because the history area is lost, no existing log archive can be used. To recover from accidentally restoring the log, start the application server with STARTUP=L, to do a COLDLOG to reconfigure the logs before proceeding.
After restoring the database directory and dbextents, start the application server with STARTUP=U and LOGMODE=A or L. The operator is asked whether the user restore completed successfully. If the answer is yes, then if LOGMODE=A, the changes in the log are applied to the database; if LOGMODE=L, the database manager takes an archive of the current log, and then restores the log archive tape files that are associated with the user archives. If the operator responds that the user restore was not done, the application server ends, and the operator must take the necessary action to resolve the problem.
Figure 72 shows an example of the control statements needed to perform a user restore using the VSE/VSAM Restore feature.
Figure 72. Example of VSE/VSAM RESTORE Command for a User Archive
In this example, VSE/VSAM restores a database having a directory (BDISK) and seven dbextents (DDSK1-DDSK7). For information about the VSE/VSAM RESTORE command, see the Using VSE/VSAM Commands and Macros manual.
For both database restores (STARTUP=R) and user restores (STARTUP=U), specify LOGMODE=A when you start the application server to have the database manager restore the database without using log archive tape files. When the database is restored, the database manager applies only the changes in the current log to the database. (This is the reason you need to do a COLDLOG if you are not using the most recent database archive, or if you accidentally restored the logs during a user restore: the log does not apply to the older archive.) After completing the restore, the database manager runs with LOGMODE=A.
The database manager still checks whether there are any log archives associated with the database archive. If there are, message ARI0247D is displayed prompting the operator either to keep LOGMODE=A and restore the database without using the log archives, or to switch to LOGMODE=L and use the log archives during the restore. If the decision is made to switch to LOGMODE=L, the database manager runs as if it had been intended to do the restore with LOGMODE=L all along.
When the restore set is complete, the archive that is restored becomes the database archive for the current restore set. A restore set consists of a database archive and the log archives associated with it in the history area -- that is, those log archives that occurred between the database archive and the next restore or COLDLOG or change of log mode.
Specify LOGMODE=L if you want the database to be restored using log archives. The database manager first restores the database archive and then takes a log archive if information is in the log that was being used when the system failed or was shut down immediately prior to the restore. It then restores the log archives that were taken after the database archive you restored. When the restore is complete, the database manager runs with LOGMODE=L.
Before restoring the database archive and each log archive, the operator is prompted to continue, stop the application server, or end the restore. Usually, the operator responds CONTINUE.
If the operator responds STOP SYSTEM, the application server ends. The next time the application server is warm-started, it will continue restoring the database using the next log archive. If it is restarted to do a restore instead of a warm start, it ignores the first restore, which was stopped, and begins a new one. If it is restarted with STARTUP=C, the application server does the equivalent of an END RESTORE (see below) and then a COLDLOG. (All subsequent log archives are no longer usable.)
The STOP SYSTEM response is used primarily for filtered log recovery. This allows you to stop the application server in the middle of a restore, change the EXTEND input file commands used for filtered log recovery, and continue the restore. For information about filtered log recovery, see the discussion on starting the application server to recover from a DBSS error in the DB2 Server for VSE & VM Diagnosis Guide and Reference manual.
The END RESTORE response is used primarily for ending a restore before processing a log archive tape that is unusable. A secondary use is to end a restore before processing a log archive that contains a user error.
Attention: If you end a restore, you may lose the ability to use subsequent log archives on a future restore.
For example, suppose you have taken a database archive and six subsequent log archives. If you discover a user error that was recorded in the fourth log archive, restore the database archive and the first three log archives. Enter END RESTORE to avoid processing the fourth, fifth, and sixth log archives. When you end the restore, it may be impossible to restore the database again using the fourth, fifth, and sixth log archives. This would be unfortunate if you had made a mistake and, in fact, should have restored the fourth log archive as well. Thus, before you respond END RESTORE, be sure you have processed the correct number of log archives.
If a situation like the one above occurs, the only way to recover the lost log archives is to restore a back-level database archive. The log archives associated with that database archive must include the ones that were lost. That is, the old database archive must have continuous log archives to the point of the END RESTORE. If it does not, you cannot recover the lost logs. For more information, see How the History Area is Used.
After the restore set is complete, the database archive and log archives that were just restored become the current restore set, unless the restore ended before all log archives in the restore set were applied. As a final step, the current log is restored if it directly followed the restored log archives.
Three types of errors can cause a failure of a database restore operation:
In these error situations, after taking corrective action, you can restart the restore process as follows:
To deal with a log error that can be bypassed or corrected, refer to the section on recovering from DBSS errors in the DB2 Server for VSE & VM Diagnosis Guide and Reference manual, especially the discussions on UNDO and REDO processing failures during a restore.
In this situation, you can reset the database to its current state by using a previous database archive and the subsequent log archives (if there are any). You can do this only if the following conditions are met:
Note: | You can switch from LOGMODE=L to A and then back to L again without breaking the continuity of the log archives, as long as you do not take a database archive while LOGMODE is set to A. For example, suppose you accidentally start the application server with LOGMODE=A instead of L. If you immediately shut down the application server without taking a database archive, the continuity of the log archives is preserved. |
To reset the database using database manager facilities, restart the application server and restore the back-level database using STARTUP=R or F with LOGMODE set to L. In response to the request to mount the archive tape, mount the tape created by the previous database manager archive. When the database archive tape is restored, the operator is prompted for the subsequent log archives.
To reset the database using user facilities, restore the database using the tape file from the previous user archive. Then start the application server with STARTUP=U and LOGMODE=L. The operator is prompted for the subsequent log archives.
In this situation, the most current level of the database that you can restore to depends on the last undamaged log archive.
To reset the database, restart the application server with STARTUP=W and LOGMODE set to L. The database manager tries to continue the restore by requesting the log archive that had caused the failure. (The database manager determines where it was interrupted.) Instead of responding CONTINUE, respond END RESTORE to the prompt in message ARI0250D.
Restart the application server with STARTUP=L and LOGMODE=Y (COLDLOG to reformat the logs). Then restart the restore job using a previous database archive tape.
Note: | This previous database archive must have been created by an SQLEND ARCHIVE,
SQLEND UARCHIVE, or ARCHIVE command known to have been issued when no
application program was accessing the database.
In these situations, all changes made to the database since the database archive was taken are lost. You can reset the database to the consistent state that existed when that database archive tape file was created. |
The procedure to recover from a system failure that occurs when the database manager is taking either a log or database archive is essentially the same as any other restart. Because it did not finish, however, the archive that was being written at the time of the failure cannot be used.
Restart the application server with STARTUP=W. If LOGMODE had been set to A or L, specify the same value; if LOGMODE had been set to Y, specify LOGMODE=A.
If the archive in-process had been an automatic archive (started by ARCHPCT), another automatic archive will be initiated immediately when the application server is started again. If it had been started by an ARCHIVE, LARCHIVE, SQLEND ARCHIVE, or SQLEND LARCHIVE command, you must reissue the command when restarting the application server. If it had been an implicit log archive created by issuing SQLEND UARCHIVE with LOGMODE set to L, reissue the SQLEND UARCHIVE command after restarting the application server with LOGMODE=L.
If a system failure occurs during database generation or COLDLOG processing, restart the operation after determining and correcting the cause of the failure.
In some cases, storage may need to be reclaimed before continuing processing. For example, an LUW is processing a DROP TABLE statement, a checkpoint is taken during this processing, and a COLDLOG operation immediately follows. If a media failure occurred before the COLDLOG, there is a possibility of rows from the dropped table still existing. However, the entry in the SYSTEM.SYSDROP catalog table no longer exists. To reclaim this storage, the dbspace containing this "dropped" table must be dropped before continuing processing.
You can move the database manager between system DASD in two ways:
You may want to replace a dbextent because:
If you are replacing all the database dbextents (as you might when moving the database to a different device type), replace the log dbextents first. Follow the procedures in Log Reconfiguration.
You may need to replace the database directory or dbextents, because one or both were damaged. In this situation, if you are running with dual logging and only one of the logs is damaged, replace the database directory or dbextents first by following the steps below. Then replace the log dataset by following the procedures in Log Reconfiguration, and finally, restore the database by following the procedures in Restoring the Database.
Use the instructions below if you are moving the database directory or dbextents. If you are moving your logs, refer to Log Reconfiguration.
To move, replace, or change a dbextent:
If you are relocating the log data sets to another device because of disk migration or to control device utilization, and the target log data set is the identical device type and size as the source log data set and the source log data set is not damaged, you can use VSE/VSAM BACKUP and RESTORE to move the log data set. See Moving the Log for more information.
This section describes how to replace a log data set (DLBL LOGDSK1 or LOGDSK2). You would replace a log data set if:
To replace log data sets:
To be able to recover in cases where the original database data sets are not available (for example, in an offsite disaster recovery situation), you should make a VSE/VSAM BACKUP copy of the log data set after every log archive or database archive. You would then recover to a secondary system. The secondary system must have the same dbextent configuration and number of logs as the original system.
If you have been running with LOGMODE=A and need to recover to a secondary system, do a log reconfiguration to initialize the log (see Log Reconfiguration), then restore the most recent archive on the secondary system.
If you have been running with LOGMODE=L and need to recover to a secondary system: