DB2 Server for VM: System Administration


Recovery Procedures

A system failure is any failure that causes the database manager to end abnormally. Such failures could occur because of an abnormal end of the VM system, or because of error conditions in the database machine.

As long as the current log is available, recovery from system failures is automatic. Even if you are running the database manager in single user mode (SYSMODE=S) with no logging (LOGMODE=N), it can recover any committed updates by using the current log. Restart recovery is performed the next time the application server is started.

If there is a system failure while you are restoring the database, see Restarting from Failure of a Database Restore.

Restarting Procedures

To perform restart recovery procedures, the operator starts the application server with STARTUP set to one of the following values:

W
Warm starting with the SQLSTART EXEC

R
Restoring from a database manager archive with the SQLSTART EXEC

U
Restoring from a user archive with the SQLSTART EXEC

S
Adding dbspaces with the SQLADBSP EXEC

E
Adding or deleting dbextents with the SQLADBEX EXEC

I
Reorganizing the catalog indexes with the SQLCIREO EXEC

M
Doing a catalog migration with the ARISMEX EXEC.

For all these settings, the log is checked at startup to see whether the last run of the database manager left any LUWs in progress. If it did, restart recovery processing starts and the changes made by those LUWs are backed out. Restart recovery processing also ensures that changes made by completed LUWs are, in fact, made.

Restart recovery procedures will not be performed if STARTUP is set to either C (for database generation with the SQLDBGEN or SQLDBINS EXECs) or L (for log reformatting or reconfiguration, called a COLDLOG operation, with the SQLLOG EXEC).

For both of these settings, the database manager does not check the log, and the LUW recovery processing does not occur.

Restoring the Database

If an unresolvable I/O error occurs on any of the devices that contain the directory or dbextents, the application server ends abnormally. It may be necessary to replace the damaged database minidisk and restore the database from the most recent archive tapes. For more information, see Replacing a Database Minidisk.

If an unresolvable I/O error occurs on a device that contains a log, it is necessary to replace the damaged minidisk. For more information, see Replacing a Log Minidisk.

Selecting the Archive Copy to Use

Locate the last successful archive of the database. If the DASD failure occurred while the most recent archive was being taken, then the last successful database archive would be the previous archive copy, not the copy interrupted by the failure.

If you are restoring from the most recent archive and the log minidisk (or at least one of the log minidisks in the case of dual logging) is not damaged, do not perform a COLDLOG before restoring. The current log is required for recovery. In the case of dual logging, where one damaged log has been replaced, restoring the database copies the good log to the new log.

If you are using a back-level database archive and LOGMODE had not been set to L when that archive was taken, or if the minidisks of the log have been changed (regardless of what LOGMODE was set to), you must run a COLDLOG with the SQLLOG EXEC to reformat the logs before restoring. For more information, see Running the SQLLOG EXEC.

You may have to redefine the directory and log disks (or both logs in the case of dual logging) at the same time due to an I/O error. If you are restoring from a user archive, perform a COLDLOG to reformat the logs before continuing with the restore. If you are restoring from an archive produced by database manager facilities, any attempt to perform a COLDLOG would result in a system error. Instead of performing the COLDLOG after redefining the database minidisks, restore the database. The restore will fail when the database manager tries to read the log. After the restore fails, do a COLDLOG to reformat the logs.

If you are restoring the database by using a database archive and subsequent log archives (LOGMODE=L), locate all the necessary log archives. If you have log archives on disk, your database machine must have access to the CMS minidisks or shared file system directories containing the log archives. If the failure occurred during the archiving of the log, do not use that final log archive tape. The database manager will automatically take another log archive when it is started for the restore.

The steps to be followed to restore your database differ, depending on whether the database had been archived using database manager or user facilities.

Restoring from a Database Manager Archive

Start the application server, with STARTUP=R and LOGMODE=A or L to restore the database using an archive created with database manager facilities. The database manager prompts the operator to mount the database archive tape, and to specify on which unit (cuu) the tape is mounted. Unless a CMS FILEDEF command for ddname ARIARCH with a virtual device address other than TAP1 (181) was submitted before running the SQLSTART EXEC, the archive tape should be mounted on virtual device 181. The tape is then opened, and the database directory and dbextents are copied.

The virtual device address for log archives is 183 (unless a FILEDEF command was issued for ddname ARILARC). Disks that contain log archives must be accessed by the database machine.

Figure 74 shows an example of doing a startup to restore a database that had been archived using database manager facilities.

Figure 74. Starting with STARTUP=R to Restore a Database

SQLSTART DB(server-name) PARM(STARTUP=R,LOGMODE=L)

Note:In this example, LOGMODE is set to L because the user normally uses log archiving.

The SQLSTART EXEC issues these FILEDEF commands for the database and log archive tapes:

   FILEDEF ARIARCH TAP1 SL (BLKSIZE 4096 NOCHANGE PERM
   FILEDEF ARILARC TAP3 SL (BLKSIZE 4096 NOCHANGE PERM

You can submit your own FILEDEF commands (to override the defaults) before running SQLSTART. You should also submit LABELDEF commands.

Do not specify a VOLID parameter for log archiving. Multiple log archive files can be created on a single run of the database manager. You would want these files to have different VOLIDs.

If you normally run with LOGMODE=A (no log archiving), you would specify LOGMODE=A at startup.

Restoring from a User Archive

Shut down the application server, and restore the database using the same user facilities that created the archive.

Do not restore the database logs. If you accidentally restore the logs, the history area and all the changes to the database recorded in the log, are lost. The database manager uses the history area to track which log archives go with which database archives. The history area must be restored from the backup copy. For more information, see History Area. To recover from accidentally restoring the log, do a COLDLOG to reconfigure the logs before proceeding. That is, run the SQLLOG EXEC without specifying the LOG1 or LOG2 parameters. Respond CONTINUE to message ARI0688D (for single logging) or ARI6129D (for dual logging). Respond 1 to message ARI0944D to reconfigure the log. When the application server is started, the log history area containing the restored database archive can be restored from the version that existed before the COLDLOG. The restored history area will contain a record of all log archives that apply to the restored database, although the log that existed just before the restore will be lost.

After restoring the database directory and dbextents, start the application server with STARTUP=U and LOGMODE=A or L. The operator is asked whether the user restore completed successfully. If the answer is yes, then if LOGMODE=A, the changes in the log are applied to the database; if LOGMODE=L, the database manager takes an archive of the current log, and then restores the log archive tape files that are associated with the user archives. If the operator responds that the user restore was not done, the application server ends, and the operator must take the necessary action to resolve the problem.

When to Use LOGMODE=A

For both database restores (STARTUP=R) and user restores (STARTUP=U), specify LOGMODE=A when you start the application server to have the database manager restore the database without using log archive tape files. When the database is restored, the database manager applies only the changes in the current log to the database. (This is the reason you need to do a COLDLOG if you are not using the most recent database archive, or if you accidentally restored the logs during a user restore: the log does not apply to the older archive.) After completing the restore, the database manager runs with LOGMODE=A.

The database manager still checks whether there are any log archives associated with the database archive. If there are, message ARI0247D is displayed prompting the operator either to keep LOGMODE=A and restore the database without using the log archives, or to switch to LOGMODE=L and use the log archives during the restore. If the decision is made to switch to LOGMODE=L, the database manager runs as if it had been intended to do the restore with LOGMODE=L all along.

When the restore set is complete, the archive that is restored becomes the database archive for the current restore set. A restore set consists of a database archive and the log archives associated with it in the history area -- that is, those log archives that occurred between the database archive and the next restore or COLDLOG or change of log mode.

When to Use LOGMODE=L

Specify LOGMODE=L if you want the database to be restored using log archives. The database manager first restores the database archive and then takes a log archive if information is in the log that was being used when the system failed or was shut down immediately prior to the restore. It then restores the log archives that were taken after the database archive you restored. When the restore is complete, the database manager runs with LOGMODE=L.

Before restoring the database archive and each log archive, the operator is prompted to continue, stop the application server, or end the restore. Usually, the operator responds CONTINUE.

If the operator responds STOP SYSTEM, the application server ends. The next time the application server is warm-started, it will continue restoring the database using the next log archive. If it is restarted to do a restore instead of a warm start, it ignores the first restore, which was stopped, and begins a new one. If it is restarted with STARTUP=C, the application server does the equivalent of an END RESTORE (see below) and then a COLDLOG. (All subsequent log archives are no longer usable.)

The STOP SYSTEM response is used primarily for filtered log recovery. This allows you to stop the application server in the middle of a restore, change the EXTEND input file commands used for filtered log recovery, and continue the restore. For information about filtered log recovery, see the discussion on starting the application server to recover from a DBSS error in the DB2 Server for VSE & VM Diagnosis Guide and Reference manual.

The END RESTORE response is used primarily for ending a restore before processing a log archive tape that is unusable. A secondary use is to end a restore before processing a log archive that contains a user error.

Attention: If you end a restore, you may lose the ability to use subsequent log archives on a future restore.

For example, suppose you have taken a database archive and six subsequent log archives. If you discover a user error that was recorded in the fourth log archive, restore the database archive and the first three log archives. Enter END RESTORE to avoid processing the fourth, fifth, and sixth log archives. When you end the restore, it may be impossible to restore the database again using the fourth, fifth, and sixth log archives. This would be unfortunate if you had made a mistake and, in fact, should have restored the fourth log archive as well. Thus, before you respond END RESTORE, be sure you have processed the correct number of log archives.

If a situation like the one above occurs, the only way to recover the lost log archives is to restore a back-level database archive. The log archives associated with that database archive must include the ones that were lost. That is, the old database archive must have continuous log archives to the point of the END RESTORE. If it does not, you cannot recover the lost logs. For more information, see How the Database Manager Uses the History Area.

After the restore set is complete, the database archive and log archives that were just restored become the current restore set, unless the restore ended before all log archives in the restore set were applied. As a final step, the current log is restored if it directly followed the restored log archives.

Restoring Log Archives from Disk

If you have any log archives on disk, the database machine must have access to the minidisks containing these log archives when it begins restoring the database. The database manager issues the FILEDEF commands needed for the log archives on disk, and also checks the timestamp for each log archive file to ensure it matches the timestamp in the history area.
Note:After a restore, the FILEDEF for ARILARC is inoperative. You must specify the FILEDEF again by either:

Restarting from Failure of a Database Restore

Three types of errors can cause a failure of a database restore operation:

  1. System failures, such as power interruptions, or operator or equipment errors that can be corrected. For example, the database manager can end because the wrong tape volume was mounted or a tape drive malfunctioned.

    In these error situations, after taking corrective action, you can restart the restore process as follows:

  2. A log archive error that can be corrected, or a failure during UNDO/REDO processing.

    To deal with a log error that can be bypassed or corrected, refer to the section on recovering from DBSS errors in the DB2 Server for VSE & VM Diagnosis Guide and Reference manual, especially the discussions on UNDO and REDO processing failures during a restore.

  3. A database or log archive input file error that cannot be corrected, such as a damaged archive tape volume. One of the following situations applies:

Restarting from a System Failure While Archiving

The procedure to recover from a system failure that occurs when the database manager is taking either a log or database archive is essentially the same as any other restart. Because it did not finish, however, the archive that was being written at the time of the failure cannot be used.

Restart the application server with STARTUP=W. If LOGMODE had been set to A or L, specify the same value; if LOGMODE had been set to Y, specify LOGMODE=A.

If the archive in-process had been an automatic archive (started by ARCHPCT), another automatic archive will be initiated immediately when the application server is started again. If it had been started by an ARCHIVE, LARCHIVE, SQLEND ARCHIVE, or SQLEND LARCHIVE command, you must reissue the command when restarting the application server. If it had been an implicit log archive created by issuing SQLEND UARCHIVE with LOGMODE set to L, reissue the SQLEND UARCHIVE command after restarting the application server with LOGMODE=L.

Restarting from Failure of a Database Generation or COLDLOG Operation

If a system failure occurs during database generation or COLDLOG processing, restart the operation (by reissuing the SQLDBINS, SQLDBGEN, or SQLLOG EXEC) after correcting the cause of the failure.

In some cases, storage may need to be reclaimed before continuing processing. For example, an LUW is processing a DROP TABLE statement, a checkpoint is taken during this processing, and a COLDLOG operation immediately follows. If a media failure occurred before the COLDLOG, there is a possibility of rows from the dropped table still existing. However, the entry in the SYSTEM.SYSDROP catalog table no longer exists. To reclaim this storage, the dbspace containing this "dropped" table must be dropped before continuing processing.

Relocating the Database Manager

You can use three ways to move the database manager between system DASD:

Replacing a Minidisk Using DASD Dump Restore

You can use the DASD dump restore (DDR) utility to replace minidisks only if there are no errors on the DASD and you keep the device type, size and virtual address the same as the original minidisk. To change any of these characteristics or recover from a DASD error, follow the procedure in Replacing a Database Minidisk, or Reconfiguring and Reformatting the Logs.

You may want to replace a minidisk either to balance your DASD workload, or to remove all data from a device. To replace it, do the following:

  1. Take an archive of the database.
  2. Update the MDISK control statements for the affected minidisks in the VM directory entry for the database virtual machine. Alter the original minidisks to different virtual device addresses, and add the new minidisks using the original virtual device addresses. Write down the original and new addresses as you will need them later. The new minidisks must be the same size and device type as the original ones.
  3. Log on to the database virtual machine, and IPL (initial program load) CMS.
  4. Do the following for each replaced minidisk:
    1. Issue the following CP command to make the original minidisk read-accessible to the virtual machine:
         LINK * cuu1 cuu2 R
      

      where:

      • the * assumes that the minidisk is defined in the VM directory for this virtual machine.
      • the cuu1 is the virtual device address as specified with the MDISK statement.
      • the cuu2 is usually the same as cuu1.
      • the R indicates that read mode access is required for this minidisk.
    2. Issue the following CP command to give the virtual machine write access to the new minidisk:
         LINK * cuu3 cuu4 W
      

      where:

      • the * assumes that the minidisk is defined in the VM directory for this virtual machine.
      • the cuu3 is the virtual device address as specified with the MDISK statement.
      • the cuu4 is usually the same as cuu1.
      • the W indicates that write mode access is required for this minidisk.
    3. Issue the following command to copy the minidisk using the DDR utility:
         DDR
      

      The DDR utility prompts you for your instruction statements. Enter the information that you wrote down from the MDISK statements for the input and output devices, CONS for the printer and copy the original minidisk to the new one, as shown below.

         SYSPRINT CONS
         INPUT cuu2 type
         OUTPUT cuu4 type
         COPY ALL
      

      where:

      • the cuu2 and cuu4 are the second virtual device addresses specified in the link commands issued earlier.
      • the type is the device type as specified with the MDISK statement.

      Respond YES to the prompts about the labels on the minidisks.

      Attention: Be sure that you are accessing the correct minidisks before you respond YES to the label prompts.

    4. Issue the following command to detach the original minidisk:
         DETACH cuu2
      

      where cuu2 is the virtual device address as specified on the LINK command and INPUT statement.

  5. Update the MDISK statements for the affected minidisks in the VM directory entry for the database machine. Remove the MDISK statements for the original minidisks.

Replacing a Database Minidisk

This section describes how to replace a database directory (ddname BDISK) minidisk or dbextent (ddname DDSKn) minidisk. You can replace directory or dbextent minidisks only if you have been archiving your database.

You may want to replace a database minidisk because:

To replace directory or database minidisks, do the following:

  1. Create a database manager archive of the database. For more information, see Archiving Procedures. The archive is required for the steps below.
  2. Write down the name of the application server and the ddnames of the database minidisks being replaced. These will be required in the steps below.
  3. Update the MDISK control statements for the affected database minidisks in the VM directory entry for the database virtual machine. Figure 90 describes the MDISK control statements for database minidisks. You must not change the virtual device addresses for the MDISK statements that are updated.

    Note the virtual addresses of the minidisks being replaced (for use below). You must ensure that the minidisks are either same size or larger than the old minidisks. If you are replacing the minidisk with one on the same device type, define the same number of cylinders or blocks.

    If you are replacing the minidisk with one on a different device type, you should define the new minidisk to be slightly larger than the old. Because of rounding that occurs in the space allocation algorithms, it is nearly impossible to define minidisks on two different device types so that the database manager considers them to be equal. If you define the new minidisk so that it is close to the same size as the old one, the restore can fail because of lack of space. When you do the restore (in a later step), the database manager uses the space it needs and ignores the rest. It will not use the blocks on the new minidisk that exceed the number of blocks on the old minidisk. For help in estimating the equivalent sizes of minidisks on various devices, refer to Determining Equivalent Minidisk Sizes on Different Device Types.

  4. Log on to the database virtual machine and do an IPL of CMS.
  5. Use the CMS QUERY DISK command to obtain a disk access letter not currently in use. (If you have no disk accessed as C, note this for use below).
  6. Do the following for each replaced database minidisk:
    1. Issue the following CP command to give the virtual machine write access to the minidisk:
         LINK * cuu1 cuu2 W
      

      • The * assumes that the minidisk is defined in the VM directory entry for this virtual machine.
      • The cuu1 and cuu2 should both be the virtual device address as specified with the MDISK statement (in step 3); that is cuu1=cuu2.
      • The W indicates that write mode access is required for this minidisk.
    2. Issue the following CMS command to initialize and label the minidisk:
         FORMAT cuu1 access-letter (BLKSIZE size
      

      • The cuu1 is the virtual device address you used on the CP LINK command above.
      • The access-letter is the disk access letter you obtained in step 5.
      • The size is 512 if the minidisk is the database directory file. Otherwise, specify 4096.
      • The FORMAT command prompts you whether to erase all files on the minidisk. Reply YES.

        Attention: Be sure that you are accessing the correct minidisk before you respond YES to the FORMAT prompt.

      • The FORMAT command prompts you for the minidisk volume label (disk label). Enter any valid volume label value. Because FILEDEF ddnames are limited to 8 characters, and minidisk volume labels are limited to 6 characters, the database manager uses the following conventions for the minidisk labels:
        • BDISK for the database directory minidisk
        • DDKn for dbextent minidisks (n is the nth dbextent).
      • Issue the following CMS commands to allocate the minidisk as a block I/O file:
           RESERVE filename filetype filemode
        

        • For filename, use the RESID.
        • For filetype, use the FILEDEF ddname of the file, where the ddname is as follows:
          • BDISK for the database directory minidisk
          • DDSKn for dbextent minidisks (n is the nth dbextent).
        • For filemode, use the access-letter that you entered above for the CMS FORMAT command.
        • The RESERVE command prompts you whether to erase all files on the minidisk. Reply YES. Be sure that you have specified the file mode letter that matches the CMS FORMAT command.
      • Issue the following CMS command to release the minidisk:
           RELEASE cuu1
        

        The cuu1 is the virtual device address that you previously specified on the LINK and FORMAT commands.

  7. Restore the database archive taken in substep 1. For more information, see Restoring the Database.

Replacing a Log Minidisk

If you are relocating the log disks to another device because of disk migration or to control device utilization, and the target log disk is the identical device type and size as the source log disk and the source log disk is not damaged, you can use the SQLCDBEX EXEC to move the log disk. See Moving Log Disks for more information.

This section describes how to replace a log minidisk (ddname LOG1 or LOG2). You would replace a log minidisk if:

  1. The minidisk is damaged by an unrecoverable DASD error.
  2. You want to change the size of your logs.
  3. You want to move your minidisks to a different device type.

To replace log minidisks:

  1. If you are replacing the only log (for single logging) or both logs (for dual logging), take a database archive if you are running with LOGMODE=A or L, because the contents of the log, including the history area, will be lost. If you are dual logging and you are only replacing one log, the archive is not lost.
  2. Update the MDISK control statements for the log minidisks in the VM directory entry for the database machine. For a description of these statements, see Figure 90.
  3. If you are replacing the only log (for single logging) or both logs (for dual logging), follow the procedures on Log Reconfiguration.
  4. If dual logging and you are only replacing one log, do the following for the replaced log:
    1. Use the CMS QUERY DISK command to obtain a disk access letter not currently in use.
    2. Issue the following CP command to give the virtual machine write access to the minidisk:
         LINK * cuu1 cuu2 W
      

      • The * assumes that the minidisk is defined in the VM directory entry for this virtual machine.
      • The cuu1 and cuu2 should both be the virtual device address as specified with the MDISK statement (in step 3); that is cuu1=cuu2.
      • The W indicates that write mode access is required for this minidisk.
    3. Issue the following CMS command to initialize and label the minidisk
         FORMAT cuu1 access-letter (BLKSIZE 4096
      

      • The cuu1 is the virtual device address you used on the CP link command above.
      • The access-letter is the disk access letter you obtained in 4a.
      • The FORMAT command prompts you whether to erase all files on the minidisk. Reply YES.

        Attention: Be sure that you are accessing the correct minidisk before you respond YES to the FORMAT prompt.

      • The FORMAT command prompts you for the minidisk volume label (disk label). Use LDISK1 for the first log minidisk, or LDISK2 for the second log minidisk.
    4. Issue the following CMS command to allocate the minidisk as a block I/O file:
         RESERVE filename filetype filemode
      

      • For filename, use the RESID.
      • For filetype, use LOGDSK1 for the first log minidisk or LOGDSK2 for the second log minidisk.
      • For filemode, use the access-letter that you entered above for the CMS FORMAT command.
      • The RESERVE command prompts you whether to erase all files on the minidisk. Reply YES. Be sure that you have specified the file mode letter that matches the CMS FORMAT command.
    5. Issue the following CMS command to release the minidisk:
         RELEASE cuu1
      

      The cuu1 is the virtual device address that you previously specified on the LINK and FORMAT commands.

Recovering to a Secondary System

To be able to recover in cases where the original database minidisks are not available (for example, in an offsite disaster recovery situation), you should make a copy of the ARIHSDS ARCHIVE file from the application server's A-disk after every log archive or database archive. You would then recover to a secondary system. The secondary system must have the same dbextent configuration and number of logs as the original system.

If you have been running with LOGMODE=A and need to recover to a secondary system, do a COLDLOG RECONFIGURE to initialize the log (see Log Reconfiguration). Then restore the most recent archive on the secondary system.

If you have been running with LOGMODE=L and need to recover to a secondary system:

  1. Do a COLDLOG RECONFIGURE to initialize the log (see Log Reconfiguration).
  2. Replace both the ARIHSDS ARCHIVE and ARIHSDS PRECLDLG files on the secondary server's A-disk with the ARIHSDS ARCHIVE file you copied after the latest database or log archive of the original system.
  3. Restore the most recent archive on the secondary system.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]