Different types of failures can occur in a relational database management system:
Because of a power or processor error, the DB2 database manager can end abnormally.
A DASD can give hardware errors, and there might be cases where this DASD should be replaced. The DB2 database manager can end abnormally.
Not only the operating system can end abnormally, but there may be some conditions where the database manager decides to end abnormally.
An application can fail and lead to a situation where some tables have been updated and committed and others that should have been are not. Interactive users can make changes and by error request an update or delete that was not intended.
The various failure sources can lead to any of the following problems, which can happen alone or in combinations:
From a failure where the complete system comes down, the DB2 database manager normally recovers after the UNDO/REDO process. Only the LUWs that were currently active when the failure occurred would have to be restarted. In some cases this is not so obvious and needs some investigation to be sure that the data in all tables is consistent with what is expected.
Here you need to determine what type of disk or extent is affected.
If single logging is used, any I/O error causes the database manager to end. With dual logging, the database updates are recorded in two log extents. The database manager continues running as long as it can read and write from one of the log disks. It is recommended to allocate these log disks on different DASD, as an unrecoverable error is unlikely to occur on both DASD at the same time.
In VM, you should additionally make sure that the database's 191 minidisk is on another DASD, because this also contains a copy of the log history file, and this A-disk file is automatically used during a restore, if the log history area is unusable due to a log failure.
Note: | There might be a situation where the database should be restored on a different system, or that all dbextents should be replaced. Refer to Recover from System Failure for a detailed explanation of saving the log history information. |
To recover from a damaged log disk, follow the procedure as described in the DB2 Server for VSE System Administration and DB2 Server for VM System Administration manuals.
To reduce the probability of failures of the directory disk (BDISK), always end the database manager with the SQLEND parameter DVERIFY, to have a directory verification. This function checks for inconsistencies in the directory, and thus allows you to avoid archiving an inconsistent directory.
Note: | It is especially recommended to perform an SQLEND DVERIFY just for that purpose after having loaded a large quantity of data or after critical data has been updated. This verification can shorten the period between an inconsistency and its discovery. |
Physical Error
The only way to recover from a physical error on the directory disk, is to restore any archive and, dependent on the logmode, recover the log (and log archives, if LOGMODE=L).
Logical Error
There are some undocumented built-in tools in DB2 that can help to recover from logical errors on the directory disk (BDISK) such as pointing to a wrong page. Because these tools are very powerful and if misused can destroy the entire database, these commands or procedures should never be run unless explicitly requested by the IBM Support Center. Before starting any of these tools, make sure you have an archive of the complete database just in case anything goes wrong.
Data Recovery gives you an overview of procedures that can be used to recover as much data as possible, should a data disk be damaged.
Physical Error
The sequence of operations to replace a data disk (dbextent) is described in the DB2 Server for VSE System Administration manual. The sequence of operations to replace a data disk (database minidisk) is described in the DB2 Server for VM System Administration manual.
Logical Error: Data or Index Page Corruption
A data or index page corruption can only be detected when this page is actually requested by the database manager. When a corruption is detected, it may be possible that the restore of your database or storage pool will not help, because this corruption was already present when the backup was taken.
Note: | Therefore it is recommended to keep different sets of backups and the logs in between, so that it may be possible to restore your database to the point where it was free of corruption, and then apply forward recovery with the logs if they are available. |
If only the index has been corrupted, a drop and create index might correct the problem.
The procedure for data pages is more complicated. Technical help from the IBM Support Center using special programs might correct such a problem, depending on the kind of corruption, but such programs should only be executed in cooperation with someone at the IBM Support Center.
From an operating system abnormal end, such as from a power or hardware failure, the DB2 database manager normally recovers after the UNDO/REDO process. Only the LUWs that were currently active when the failure occurred would have to be restarted. In some cases this is not so obvious and needs some investigation to be sure that the data in all tables is consistent with what is expected.
Due to a problem with the DB2 software, in an extreme situation a corruption could occur in the database. If the database becomes corrupted, the problem may not be very obvious and may not be noticed until several archives have been taken.
A problem may occur with an application and lead to a situation where some tables have been updated and committed and others that should have been are not. This situation requires investigation to restore the data to a consistent state, as it was before the application was started or to a point that all data is synchronized. A similar situation can happen when interactive users make changes and by error request an update or delete of data that was not intended.