A system failure is any failure that causes the database manager to end abnormally. Such failures could occur because of an abnormal end of the VM system, or because of error conditions in the database machine.
As long as the current log is available, recovery from system failures is automatic. Even if you are running the database manager in single user mode (SYSMODE=S) with no logging (LOGMODE=N), it can recover any committed updates by using the current log. Restart recovery is performed the next time the application server is started.
If there is a system failure while you are restoring the database, see Restarting from Failure of a Database Restore.
To perform restart recovery procedures, the operator starts the application server with STARTUP set to one of the following values:
For all these settings, the log is checked at startup to see whether the last run of the database manager left any LUWs in progress. If it did, restart recovery processing starts and the changes made by those LUWs are backed out. Restart recovery processing also ensures that changes made by completed LUWs are, in fact, made.
Restart recovery procedures will not be performed if STARTUP is set to either C (for database generation with the SQLDBGEN or SQLDBINS EXECs) or L (for log reformatting or reconfiguration, called a COLDLOG operation, with the SQLLOG EXEC).
For both of these settings, the database manager does not check the log, and the LUW recovery processing does not occur.
If an unresolvable I/O error occurs on any of the devices that contain the directory or dbextents, the application server ends abnormally. It may be necessary to replace the damaged database minidisk and restore the database from the most recent archive tapes. For more information, see Replacing a Database Minidisk.
If an unresolvable I/O error occurs on a device that contains a log, it is necessary to replace the damaged minidisk. For more information, see Replacing a Log Minidisk.
Locate the last successful archive of the database. If the DASD failure occurred while the most recent archive was being taken, then the last successful database archive would be the previous archive copy, not the copy interrupted by the failure.
If you are restoring from the most recent archive and the log minidisk (or at least one of the log minidisks in the case of dual logging) is not damaged, do not perform a COLDLOG before restoring. The current log is required for recovery. In the case of dual logging, where one damaged log has been replaced, restoring the database copies the good log to the new log.
If you are using a back-level database archive and LOGMODE had not been set to L when that archive was taken, or if the minidisks of the log have been changed (regardless of what LOGMODE was set to), you must run a COLDLOG with the SQLLOG EXEC to reformat the logs before restoring. For more information, see Running the SQLLOG EXEC.
You may have to redefine the directory and log disks (or both logs in the case of dual logging) at the same time due to an I/O error. If you are restoring from a user archive, perform a COLDLOG to reformat the logs before continuing with the restore. If you are restoring from an archive produced by database manager facilities, any attempt to perform a COLDLOG would result in a system error. Instead of performing the COLDLOG after redefining the database minidisks, restore the database. The restore will fail when the database manager tries to read the log. After the restore fails, do a COLDLOG to reformat the logs.
If you are restoring the database by using a database archive and subsequent log archives (LOGMODE=L), locate all the necessary log archives. If you have log archives on disk, your database machine must have access to the CMS minidisks or shared file system directories containing the log archives. If the failure occurred during the archiving of the log, do not use that final log archive tape. The database manager will automatically take another log archive when it is started for the restore.
The steps to be followed to restore your database differ, depending on whether the database had been archived using database manager or user facilities.
Start the application server, with STARTUP=R and LOGMODE=A or L to restore the database using an archive created with database manager facilities. The database manager prompts the operator to mount the database archive tape, and to specify on which unit (cuu) the tape is mounted. Unless a CMS FILEDEF command for ddname ARIARCH with a virtual device address other than TAP1 (181) was submitted before running the SQLSTART EXEC, the archive tape should be mounted on virtual device 181. The tape is then opened, and the database directory and dbextents are copied.
The virtual device address for log archives is 183 (unless a FILEDEF command was issued for ddname ARILARC). Disks that contain log archives must be accessed by the database machine.
Figure 74 shows an example of doing a startup to restore a database that had been archived using database manager facilities.
Figure 74. Starting with STARTUP=R to Restore a Database
SQLSTART DB(server-name) PARM(STARTUP=R,LOGMODE=L) |
Note: | In this example, LOGMODE is set to L because the user normally uses log archiving. |
The SQLSTART EXEC issues these FILEDEF commands for the database and log archive tapes:
FILEDEF ARIARCH TAP1 SL (BLKSIZE 4096 NOCHANGE PERM FILEDEF ARILARC TAP3 SL (BLKSIZE 4096 NOCHANGE PERM
You can submit your own FILEDEF commands (to override the defaults) before running SQLSTART. You should also submit LABELDEF commands.
Do not specify a VOLID parameter for log archiving. Multiple log archive files can be created on a single run of the database manager. You would want these files to have different VOLIDs.
If you normally run with LOGMODE=A (no log archiving), you would specify LOGMODE=A at startup.
Shut down the application server, and restore the database using the same user facilities that created the archive.
Do not restore the database logs. If you accidentally restore the logs, the history area and all the changes to the database recorded in the log, are lost. The database manager uses the history area to track which log archives go with which database archives. The history area must be restored from the backup copy. For more information, see History Area. To recover from accidentally restoring the log, do a COLDLOG to reconfigure the logs before proceeding. That is, run the SQLLOG EXEC without specifying the LOG1 or LOG2 parameters. Respond CONTINUE to message ARI0688D (for single logging) or ARI6129D (for dual logging). Respond 1 to message ARI0944D to reconfigure the log. When the application server is started, the log history area containing the restored database archive can be restored from the version that existed before the COLDLOG. The restored history area will contain a record of all log archives that apply to the restored database, although the log that existed just before the restore will be lost.
After restoring the database directory and dbextents, start the application server with STARTUP=U and LOGMODE=A or L. The operator is asked whether the user restore completed successfully. If the answer is yes, then if LOGMODE=A, the changes in the log are applied to the database; if LOGMODE=L, the database manager takes an archive of the current log, and then restores the log archive tape files that are associated with the user archives. If the operator responds that the user restore was not done, the application server ends, and the operator must take the necessary action to resolve the problem.
For both database restores (STARTUP=R) and user restores (STARTUP=U), specify LOGMODE=A when you start the application server to have the database manager restore the database without using log archive tape files. When the database is restored, the database manager applies only the changes in the current log to the database. (This is the reason you need to do a COLDLOG if you are not using the most recent database archive, or if you accidentally restored the logs during a user restore: the log does not apply to the older archive.) After completing the restore, the database manager runs with LOGMODE=A.
The database manager still checks whether there are any log archives associated with the database archive. If there are, message ARI0247D is displayed prompting the operator either to keep LOGMODE=A and restore the database without using the log archives, or to switch to LOGMODE=L and use the log archives during the restore. If the decision is made to switch to LOGMODE=L, the database manager runs as if it had been intended to do the restore with LOGMODE=L all along.
When the restore set is complete, the archive that is restored becomes the database archive for the current restore set. A restore set consists of a database archive and the log archives associated with it in the history area -- that is, those log archives that occurred between the database archive and the next restore or COLDLOG or change of log mode.
Specify LOGMODE=L if you want the database to be restored using log archives. The database manager first restores the database archive and then takes a log archive if information is in the log that was being used when the system failed or was shut down immediately prior to the restore. It then restores the log archives that were taken after the database archive you restored. When the restore is complete, the database manager runs with LOGMODE=L.
Before restoring the database archive and each log archive, the operator is prompted to continue, stop the application server, or end the restore. Usually, the operator responds CONTINUE.
If the operator responds STOP SYSTEM, the application server ends. The next time the application server is warm-started, it will continue restoring the database using the next log archive. If it is restarted to do a restore instead of a warm start, it ignores the first restore, which was stopped, and begins a new one. If it is restarted with STARTUP=C, the application server does the equivalent of an END RESTORE (see below) and then a COLDLOG. (All subsequent log archives are no longer usable.)
The STOP SYSTEM response is used primarily for filtered log recovery. This allows you to stop the application server in the middle of a restore, change the EXTEND input file commands used for filtered log recovery, and continue the restore. For information about filtered log recovery, see the discussion on starting the application server to recover from a DBSS error in the DB2 Server for VSE & VM Diagnosis Guide and Reference manual.
The END RESTORE response is used primarily for ending a restore before processing a log archive tape that is unusable. A secondary use is to end a restore before processing a log archive that contains a user error.
Attention: If you end a restore, you may lose the ability to use subsequent log archives on a future restore.
For example, suppose you have taken a database archive and six subsequent log archives. If you discover a user error that was recorded in the fourth log archive, restore the database archive and the first three log archives. Enter END RESTORE to avoid processing the fourth, fifth, and sixth log archives. When you end the restore, it may be impossible to restore the database again using the fourth, fifth, and sixth log archives. This would be unfortunate if you had made a mistake and, in fact, should have restored the fourth log archive as well. Thus, before you respond END RESTORE, be sure you have processed the correct number of log archives.
If a situation like the one above occurs, the only way to recover the lost log archives is to restore a back-level database archive. The log archives associated with that database archive must include the ones that were lost. That is, the old database archive must have continuous log archives to the point of the END RESTORE. If it does not, you cannot recover the lost logs. For more information, see How the Database Manager Uses the History Area.
After the restore set is complete, the database archive and log archives that were just restored become the current restore set, unless the restore ended before all log archives in the restore set were applied. As a final step, the current log is restored if it directly followed the restored log archives.
If you have any log archives on disk, the database machine must have access to the minidisks containing these log archives when it begins restoring the database. The database manager issues the FILEDEF commands needed for the log archives on disk, and also checks the timestamp for each log archive file to ensure it matches the timestamp in the history area.
Note: | After a restore, the FILEDEF for ARILARC is inoperative. You must specify the FILEDEF again by either: |
Three types of errors can cause a failure of a database restore operation:
In these error situations, after taking corrective action, you can restart the restore process as follows:
To deal with a log error that can be bypassed or corrected, refer to the section on recovering from DBSS errors in the DB2 Server for VSE & VM Diagnosis Guide and Reference manual, especially the discussions on UNDO and REDO processing failures during a restore.
In this situation, you can reset the database to its current state by using a previous database archive and the subsequent log archives (if there are any). You can do this only if the following conditions are met:
Note: | You can switch from LOGMODE=L to A and then back to L again without breaking the continuity of the log archives, as long as you do not take a database archive while LOGMODE is set to A. For example, suppose you accidentally start the application server with LOGMODE=A instead of L. If you immediately shut down the application server without taking a database archive, the continuity of the log archives is preserved. |
To reset the database using database manager facilities, restart the application server and restore the back-level database using STARTUP=R with LOGMODE set to L. In response to the request to mount the archive tape, mount the tape created by the previous database manager archive. When the database archive tape is restored, the operator is prompted for the subsequent log archives.
To reset the database using user facilities, restore the database using the tape file from the previous user archive. Then start the application server with STARTUP=U and LOGMODE=L. The operator is prompted for the subsequent log archives.
In this situation, the most current level of the database that you can restore to depends on the last undamaged log archive.
To reset the database, restart the application server with STARTUP=W and LOGMODE set to L. The database manager tries to continue the restore by requesting the log archive that had caused the failure. (The database manager determines where it was interrupted.) Instead of responding CONTINUE, respond END RESTORE to the prompt in message ARI0250D.
Run the SQLLOG EXEC without specifying the LOG1 and LOG2 parameters to reformat the logs. Respond CONTINUE to message ARI0688D (for single logging) or ARI6129D (for dual logging), then respond 0 to message ARI0944D to reformat the log minidisk (or minidisks). Then restart the restore process using a previous database archive tape.
Note: | This previous database archive must have been created by an SQLEND ARCHIVE,
SQLEND UARCHIVE, or ARCHIVE command known to have been issued when no
application program was accessing the database.
In these situations, all changes made to the database since the database archive was taken are lost. You can reset the database to the consistent state that existed when that database archive tape file was created. |
Run the SQLLOG EXEC without specifying the LOG1 and LOG2 parameters to reconfigure the logs. Respond CONTINUE to message ARI0688D (for single logging) or ARI6129D (for dual logging), then respond 1 to message ARI0944D to reconfigure the log minidisk (or minidisks). Then restart the restore process.
In these situations, all changes made to the database since the database archive was taken are lost. You can reset the database to the consistent state that existed when that database archive tape file was created.
The procedure to recover from a system failure that occurs when the database manager is taking either a log or database archive is essentially the same as any other restart. Because it did not finish, however, the archive that was being written at the time of the failure cannot be used.
Restart the application server with STARTUP=W. If LOGMODE had been set to A or L, specify the same value; if LOGMODE had been set to Y, specify LOGMODE=A.
If the archive in-process had been an automatic archive (started by ARCHPCT), another automatic archive will be initiated immediately when the application server is started again. If it had been started by an ARCHIVE, LARCHIVE, SQLEND ARCHIVE, or SQLEND LARCHIVE command, you must reissue the command when restarting the application server. If it had been an implicit log archive created by issuing SQLEND UARCHIVE with LOGMODE set to L, reissue the SQLEND UARCHIVE command after restarting the application server with LOGMODE=L.
If a system failure occurs during database generation or COLDLOG processing, restart the operation (by reissuing the SQLDBINS, SQLDBGEN, or SQLLOG EXEC) after correcting the cause of the failure.
In some cases, storage may need to be reclaimed before continuing processing. For example, an LUW is processing a DROP TABLE statement, a checkpoint is taken during this processing, and a COLDLOG operation immediately follows. If a media failure occurred before the COLDLOG, there is a possibility of rows from the dropped table still existing. However, the entry in the SYSTEM.SYSDROP catalog table no longer exists. To reclaim this storage, the dbspace containing this "dropped" table must be dropped before continuing processing.
You can use three ways to move the database manager between system DASD:
You can use the DASD dump restore (DDR) utility to replace minidisks only if there are no errors on the DASD and you keep the device type, size and virtual address the same as the original minidisk. To change any of these characteristics or recover from a DASD error, follow the procedure in Replacing a Database Minidisk, or Reconfiguring and Reformatting the Logs.
You may want to replace a minidisk either to balance your DASD workload, or to remove all data from a device. To replace it, do the following:
LINK * cuu1 cuu2 R
where:
LINK * cuu3 cuu4 W
where:
DDR
The DDR utility prompts you for your instruction statements. Enter the information that you wrote down from the MDISK statements for the input and output devices, CONS for the printer and copy the original minidisk to the new one, as shown below.
SYSPRINT CONS INPUT cuu2 type OUTPUT cuu4 type COPY ALL
where:
Respond YES to the prompts about the labels on the minidisks.
Attention: Be sure that you are accessing the correct minidisks before you respond YES to the label prompts.
DETACH cuu2
where cuu2 is the virtual device address as specified on the LINK command and INPUT statement.
This section describes how to replace a database directory (ddname BDISK) minidisk or dbextent (ddname DDSKn) minidisk. You can replace directory or dbextent minidisks only if you have been archiving your database.
You may want to replace a database minidisk because:
You may need to replace the directory or database minidisks, because one or both were damaged. In this situation, if you are running with dual logging and only one of the logs is damaged, replace the directory or database minidisks first by following the steps below. Then replace the log minidisk by following the procedures in Log Reconfiguration. Finally, restore the database by following the procedures in Replacing a Log Minidisk.
If you are replacing all the database minidisks (as you might when moving the database to a different device type), replace the log minidisks first. Follow the procedures in Log Reconfiguration.
Use the instructions below if you are moving the directory or database minidisks. If you are moving your logs, refer to Log Reconfiguration.
To replace directory or database minidisks, do the following:
Note the virtual addresses of the minidisks being replaced (for use below). You must ensure that the minidisks are either same size or larger than the old minidisks. If you are replacing the minidisk with one on the same device type, define the same number of cylinders or blocks.
If you are replacing the minidisk with one on a different device type, you should define the new minidisk to be slightly larger than the old. Because of rounding that occurs in the space allocation algorithms, it is nearly impossible to define minidisks on two different device types so that the database manager considers them to be equal. If you define the new minidisk so that it is close to the same size as the old one, the restore can fail because of lack of space. When you do the restore (in a later step), the database manager uses the space it needs and ignores the rest. It will not use the blocks on the new minidisk that exceed the number of blocks on the old minidisk. For help in estimating the equivalent sizes of minidisks on various devices, refer to Determining Equivalent Minidisk Sizes on Different Device Types.
LINK * cuu1 cuu2 W
FORMAT cuu1 access-letter (BLKSIZE size
Attention: Be sure that you are accessing the correct minidisk before you respond YES to the FORMAT prompt.
RESERVE filename filetype filemode
RELEASE cuu1
The cuu1 is the virtual device address that you previously specified on the LINK and FORMAT commands.
If you are relocating the log disks to another device because of disk migration or to control device utilization, and the target log disk is the identical device type and size as the source log disk and the source log disk is not damaged, you can use the SQLCDBEX EXEC to move the log disk. See Moving Log Disks for more information.
This section describes how to replace a log minidisk (ddname LOG1 or LOG2). You would replace a log minidisk if:
To replace log minidisks:
LINK * cuu1 cuu2 W
FORMAT cuu1 access-letter (BLKSIZE 4096
Attention: Be sure that you are accessing the correct minidisk before you respond YES to the FORMAT prompt.
RESERVE filename filetype filemode
RELEASE cuu1
The cuu1 is the virtual device address that you previously specified on the LINK and FORMAT commands.
To be able to recover in cases where the original database minidisks are not available (for example, in an offsite disaster recovery situation), you should make a copy of the ARIHSDS ARCHIVE file from the application server's A-disk after every log archive or database archive. You would then recover to a secondary system. The secondary system must have the same dbextent configuration and number of logs as the original system.
If you have been running with LOGMODE=A and need to recover to a secondary system, do a COLDLOG RECONFIGURE to initialize the log (see Log Reconfiguration). Then restore the most recent archive on the secondary system.
If you have been running with LOGMODE=L and need to recover to a secondary system: