Recovering your environment can be very important to prevent the loss of critical data. A number of parameters are available to help you manage your environment and to ensure that you can perform adequate recovery of your data or transactions. These parameters are grouped into the following categories:
The following parameters provide information about number, size and status of the files used for database logging:
This parameter defines the size of each primary and secondary log file. The size of these log files limits the number of log records that can be written to them before they become full and a new log file is required.
The use of primary and secondary log files as well as the action taken when a log file becomes full are dependent on the type of logging that is being performed:
A primary log file can be reused when the changes recorded in it have been committed. If the log file size is small and applications have processed a large number of changes to the database without committing the changes, a primary log file can quickly become full. If all primary log files become full, the database manager will allocate secondary log files to hold the new log records.
When a primary log file is full, the log is archived and a new primary log file is allocated.
Recommendation: You must balance the size of the log files with the number of primary log files:
Note: | The total log file size limit is 32 GB. That is, the number of log files (logprimary + logsecond) multiplied by the size of each log file in bytes (logfilsiz * 4096) must be less than 32 GB. |
A log file that is too small can affect system performance because of the overhead of archiving old log files, allocating new log files, and waiting for a usable log file.
A log file that is too large can reduce your flexibility when managing archived log files and copies of log files, since some media may not be able to hold an entire log file.
If you are using log retention, the current active log file is closed and truncated when the last application disconnects from a database. When the next connection to the database occurs, the next log file is used. Therefore, if you understand the logging requirements of your concurrent applications you may be able to determine a log file size which will not allocate excessive amounts of wasted space.
Refer to "Configuration Parameters for Database Logging" in the Administration Guide: Implementation for more information on this parameter.
The primary log files establish a fixed amount of storage allocated to the recovery log files. This parameter allows you to specify the number of primary log files to be preallocated.
Under circular logging, the primary logs are used repeatedly in sequence. That is, when a log is full, the next primary log in the sequence is used if it is available. A log is considered available if all units of work with log records in it have been committed or rolled-back. If the next primary log in sequence is not available, then a secondary log is allocated and used. Additional secondary logs are allocated and used until the next primary log in the sequence becomes available or the limit imposed by the logsecond parameter is reached. These secondary log files are dynamically deallocated as they are no longer needed by the database manager.
The number of primary and secondary log files must comply with the following equation:
Recommendation: The value chosen for this parameter depends on a number of factors, including the type of logging being used, the size of the log files, and the type of processing environment (for example, length of transactions and frequency of commits).
Increasing this value will increase the disk requirements for the logs because the primary log files are preallocated during the very first connection to the database.
If you find that secondary log files are frequently being allocated, you may be able to improve system performance by increasing the log file size (logfilsiz) or by increasing the number of primary log files.
For databases that are not frequently accessed, in order to save disk storage, set the parameter to 2. For databases enabled for roll-forward recovery, set the parameter larger to avoid the overhead of allocating new logs almost immediately.
You may use the database system monitor to help you size the primary log files.
For more information see the following monitor element descriptions in the System Monitor Guide and Reference:
Observation of these monitor values over a period of time will aid in better tuning decisions, as average values may be more representative of your ongoing requirements.
This parameter specifies the number of secondary log files that are created and used for recovery log files (only as needed). When the primary log files become full, the secondary log files (of size logfilsiz) are allocated one at a time as needed, up to a maximum number as controlled by this parameter. An error code will be returned to the application, and the database will be shutdown, if more secondary log files are required than are allowed by this parameter.
See Number of Primary Log Files (logprimary) for more information about how secondary logs are used.
Recommendation: Use secondary log files for databases that have periodic needs for large amounts of log space. For example, an application that is run once a month may require log space beyond that provided by the primary log files. Since secondary log files do not require permanent file space they are advantageous in this type of situation.
This parameter allows you to specify a string of up to 242 bytes to change the location where the log files are stored. The string can point to either a path name, or to a raw device. If the string points to a path name, it must be a fully qualified path name, not a relative path name.
Note: | In a partitioned database environment, the node number is automatically appended to the path. This is done to maintain the uniqueness of the path in multiple logical node configurations. |
To specify a device, specify a string that the operating system identifies as a device. For example:
Note: | You must have Windows NT Version 4.0 with Service Pack 3 installed to be able to write logs to a device. |
Note: | You can only specify a device on AIX, Windows NT, and Solaris platforms. |
The new setting does not become the value of logpath until both of the following occur:
When the first new connection is made to the database, the database manager will move the logs to the new location specified by logpath.
There might be log files in the old log path. These log files might not have been archived. You might need to archive these log files manually. Also, if you are running replication on this database, replication might still need the log files from before the log path change. If the database is configured with the User Exit Enable (userexit) database configuration parameter set to "Yes", and if all the log files have been archived either by DB2 automatically or by yourself manually, then DB2 will be able to retrieve the log files to complete the replication process. Otherwise, you can copy the files from the old log path to the new log path.
Recommendation: Ideally, the log files will be on a physical disk which does not have high I/O. For instance, avoid putting the logs on the same disk as the operating system or high volume databases. This will allow for efficient logging activity with a minimum of overhead such as waiting for I/O.
You may use the database system monitor to track the number of I/O's related to database logging.
For more information, refer to the following monitor element descriptions in the System Monitor Guide and Reference:
The preceding data elements return the amount of I/O activity related to database logging. You can use an operating system monitor tool to collect information about other disk I/O activity, then compare the two types of I/O activity.
This parameter contains the current path being used for logging purposes. You cannot change this parameter directly as it is set by the database manager after a change to the newlogpath parameter becomes effective.
When a database is created, the recovery log file for it is created in a subdirectory of the directory containing the database. The default is a subdirectory named SQLOGDIR under the directory created for the database.
This parameter contains the name of the log file that is currently active.
The following parameters can influence the type and performance of database logging:
This parameter allows you to delay the writing of log records to disk until a minimum number of commits have been performed. This delay can help reduce the database manager overhead associated with writing log records and as a result improve performance when you have multiple applications running against a database and many commits are requested by the applications within a very short time frame.
This grouping of commits will only occur when the value of this parameter is greater than one and when the number of applications connected to the database is greater than or equal to the value of this parameter. When commit grouping is being performed, application commit requests are held until either one second has elapsed or the number of commit requests equals the value of this parameter.
Changes to the value specified for this parameter take effect immediately; you do not have to wait until all applications disconnect from the database.
Recommendation: Increase this parameter from its default value if multiple read/write applications typically request concurrent database commits. This will result in more efficient logging file I/O as it will occur less frequently and write more log records each time it does occur.
You could also sample the number of transactions per second and adjust this parameter to accommodate the peak number of transactions per second (or some large percentage of it). Accommodating peak activity would minimize the overhead of writing log records during heavy load periods.
If you increase mincommit, you may also need to increase the logbufsz parameter to avoid having a full log buffer force a write during these heavy load periods. In this case, the logbufsz should be equal to:
mincommit * (log space used, on average, by a transaction)
You may use the database system monitor to help you tune this parameter in the following ways:
Taking monitor samples throughout a typical day, you can determine your heavy load periods. You can calculate the total transactions by adding the following monitor elements:
Using this information and the available timestamps, you can calculate the number of transactions per second.
Using sampling techniques over a period of time and a number of transactions, you can calculate an average of the log space used with the following monitor element:
For more information about the database system monitor, see the System Monitor Guide and Reference.
This parameter is used to:
To influence the number of logs required for crash recovery, the database manager uses this parameter to trigger the page cleaners to ensure that pages older than the specified recovery window are already written to disk.
At the time of a database failure resulting from an event such as a power failure, there may have been changes to the database which:
When a database is restarted, the log files will be used to perform a crash recovery of the database which ensures that the database is left in a consistent state (that is, all committed transactions are applied to the database and all uncommitted transactions are not applied to the database).
To determine which records from the log file need to be applied to the database, the database manager uses a log control file. This log control file is periodically written to disk, and, depending on the frequency of this event, the database manager may be applying log records of committed transactions or applying log records that describe changes that have already been written from the buffer pool to disk. These log records have no impact on the database, but applying them introduces some overhead into the database restart process.
The log control file is always written to disk when a log file is full, and during soft checkpoints. You can use this configuration parameter to trigger additional soft checkpoints.
The timing of soft checkpoints is based on the difference between the "current state" and the "recorded state", given as a percentage of the logfilsiz. The "recorded state" is determined by the oldest valid log record indicated in the log control file on disk, while the "current state" is determined by the log control information in memory. (The oldest valid log record is the first log record that the recovery process would read.) The soft checkpoint will be taken if the value calculated by the following formula is greater than or equal to the value of this parameter:
( (space between recorded and current states) / logfilsiz ) * 100 * logprimary
Recommendation: You may want to increase or reduce the value of this parameter, depending on whether your acceptable recovery window is greater than or less than one log file. Lowering the value of this parameter will cause the database manager both to trigger the page cleaners more often and to take more frequent soft checkpoints. These actions can reduce both the number of log records that need to be processed and the number of redundant log records that are processed during crash recovery.
Note however, that more page cleaner triggers and more frequent soft checkpoints increase the overhead associated with database logging, which can impact the performance of the database manager. Also, more frequent soft checkpoints may not reduce the time required to restart a database, if you have:
In both of these cases, the log control information kept in memory does not change frequently and there is no advantage in writing the log control information to disk, unless it has changed.
The values are as follows:
If logretain is set to "Recovery" or userexit is set to "Yes", the active log files will be retained and become online archive log files for use in roll-forward recovery. This is called log retention logging.
After logretain is set to "Recovery" or userexit is set to "Yes" (or both), you must make a full backup of the database. This state is indicated by the backup_pending flag parameter.
If logretain is set to "No" and userexit is set to "No", roll-forward recovery is not available for the database.
When logretain is set to "Capture", the Capture program calls the PRUNE LOGFILE command to delete log files when the Capture program completes. You should not set logretain to "Capture" if you want to perform roll-forward recovery on the database.
If logretain is set to "No" and userexit is set to "No", logs are not retained. In this situation, the database manager deletes all log files in the logpath directory (including online archive log files), allocates new active log files, and reverts to circular logging.
If this parameter is enabled, log retention logging is performed regardless of how the logretain parameter is set. This parameter also indicates that a user exit program should be used to archive and retrieve the log files. Log files are archived when the database manager closes the log file. They are retrieved when the ROLLFORWARD utility needs to use them to restore a database.
After logretain, or userexit, or both of these parameters are enabled, you must make a full backup of the database. This state is indicated by the backup_pending flag parameter.
If both of these parameters are de-selected, roll-forward recovery becomes unavailable for the database because logs will no longer be retained. In this case, the database manager deletes all log files in the logpath directory (including online archive log files), allocates new active log files, and reverts to circular logging.
Refer to "User Exit for Database Recovery" in the Administration Guide: Implementation for more information on the user exit program.
The following parameters affect various aspects of database recovery:
See also Distributed Unit of Work Recovery.
The following parameters are used when working with Tivoli Storage Manager (TSM):
When this parameter is set on, the database manager automatically calls the restart database utility, if needed, when an application connects to a database. Crash recovery is the operation performed by the restart database utility. It is performed if the database terminated abnormally while applications were connected to it. An abnormal termination of the database could be caused by a power failure or a system software failure. It applies any committed transactions that were in the database buffer pool but were not written to disk at the time of the failure. It also backs out any uncommitted transactions that may have been written to disk.
If autorestart is not enabled, then an application that attempts to connect to a database which needs to have crash recovery performed (needs to be restarted) will receive a SQL1015N error. In this case, the application can call the restart database utility, or you can restart the database by selecting the restart operation of the recovery tool.
This parameter indicates when the database manager will attempt to rebuild invalid indexes. There are three possible settings for this parameter:
For the numeric equivalents and API constants for these values, refer to the Administrative API Reference.
Indexes can become invalid when fatal disk problems occur. If this happens to the data itself, the data could be lost. However, if this happens to an index, the index can be recovered by re-creating it. If an index is rebuilt while users are connected to the database, two problems could occur:
Recommendation: The best choice for this option on a high-user server and if restart time is not a concern, would be to have the index rebuilt at DATABASE RESTART time as part of the process of bringing the database back online after a crash.
Setting this parameter to "ACCESS" will result in a degradation of the performance of the database manager while the index is being re-created. Any user accessing that specific index or table would have to wait until the re-creating is complete.
If this parameter is set to "RESTART", the time taken to restart the database will be longer due to index re-creation but normal processing would not be impacted once the database has been brought back online.
This parameter specifies the default number of sessions that will be used during the recovery of a table load. The value should be set to an optimal number of I/O sessions to be used to retrieve a load copy. The retrieval of a load copy is an operation similar to restore. You can override this parameter through entries in the copy location file specified by the environment variable DB2LOADREC.
The default number of buffers used for load retrieval is two more than the value of this parameter. You can also override the number of buffers in the copy location file.
This parameter is applicable only if roll forward recovery is enabled.
Refer to Data Movement Utilities Guide and Reference for more information about load recovery.
This parameter specifies the number of database backups to retain for a database. After the specified number of backups is reached, old backups are marked as expired in the recovery history file. Recovery history file entries for the table space backups and load copy backups that are related to the expired database backup are also marked as expired. When a backup is marked as expired, the physical backups can be removed from where they are stored (for example, disk, tape, ADSM). The next database backup will prune the expired entries from the recovery history file.
When a database backup is marked as expired in the history file, any corresponding file backups linked through a DB2 Data Links Manager will be removed from its archive server.
The rec_his_retentn configuration parameter should be set to a value compatible with the value of num_db_backups. For example, if num_db_backup is set to a large value, the value for rec_his_retentn should be large enough to support that number of backups.
This parameter is used to specify the number of days that historical information on backups should be retained. If the recovery history file is not needed to keep track of backups, restores, and loads, this parameter can be set to a small number.
If value of this parameter is -1, the recovery history file can only be pruned explicitly using the available commands or APIs. If the value is not -1, the recovery history file is pruned after every full database backup.
The value of this parameter will override the value of the num_db_backups parameter, but rec_his_retentn and num_db_backups must work together. If the value for num_db_backups is large, the value for rec_his_retentn should be large enough to support that number of backups.
No matter how small the retention period, the most recent full database backup plus its restore set will always be kept, unless you use the PRUNE utility with the FORCE option. For more information about this utility, refer to the Command Reference.
The Tivoli Storage Manager management class tells how the TSM server should manage the backup versions of the objects being backed up.
The default is that there is no TSM management class.
The management class is assigned from the Tivoli Storage Manager administrator. Once assigned, you should set this parameter to the management class name. When performing any TSM backup, the database manager uses this parameter to pass the management class to TSM.
Refer to "Tivoli Storage Manager" in the Administration Guide: Implementation for more information on Tivoli Storage Manager.
This parameter is used to override the default setting for the password associated with the Tivoli Storage Manager (TSM) product. The password is needed to allow you to restore a database that was backed up to TSM from another node.
Note: | If the tsm_nodename is overridden during a backup done with DB2 (for example, with the BACKUP DATABASE command), the tsm_password may also have to be set. |
The default is that you can only restore a database from TSM on the same node from which you did the backup. It is possible for the tsm_nodename to be overridden during a backup done with DB2.
Refer to "Tivoli Storage Manager" in the Administration Guide: Implementation for more information on Tivoli Storage Manager.
This parameter is used to override the default setting for the node name associated with the Tivoli Storage Manager (TSM) product. The node name is needed to allow you to restore a database that was backed up to TSM from another node.
The default is that you can only restore a database from TSM on the same node from which you did the backup. It is possible for the tsm_nodename to be overridden during a backup done through DB2 (for example, with the BACKUP DATABASE command).
Refer to "Tivoli Storage Manager" in the Administration Guide: Implementation for more information on Tivoli Storage Manager.
This parameter is used to override the default setting for the owner associated with the Tivoli Storage Manager (TSM) product. The owner name is needed to allow you to restore a database that was backed up to ADSM from another node. It is possible for the tsm_owner to be overridden during a backup done through DB2 (for example, with the BACKUP DATABASE command).
Note: | The owner name is case sensitive. |
The default is that you can only restore a database from TSM on the same node from which you did the backup.
Refer to "Tivoli Storage Manager" in the Administration Guide: Implementation for more information on Tivoli Storage Manager.
The following parameters affect the recovery of Distributed Unit of Work (DUOW) transactions:
This parameter identifies the name of the Transaction Manager (TM) database for each DB2 instance. A TM database can be:
The TM database is a database that is used as a logger and coordinator, and is used to perform recovery for indoubt transactions.
You may set this parameter to 1ST_CONN which will set the TM database to be the first database to which a user connects.
Refer to "Distributed Databases" in the Administration Guide: Planning for more information on distributed unit of work.
Recommendation: For simplified administration and operation you may wish to create a few databases over a number of instances and use these databases exclusively as TM databases.
This parameter specifies the time interval in seconds for which a Transaction Manager (TM), Resource Manager (RM) or Sync Point Manager (SPM) should retry the recovery of any outstanding indoubt transactions found in the TM, the RM, or the SPM. This parameter is applicable when you have transactions running in a distributed unit of work (DUOW) environment.
Refer to "Distributed Databases" in the Administration Guide: Planning for more information on distributed unit of work.
Recommendation: If, in your environment, indoubt transactions will not interfere with other transactions against your database, you may wish to increase the value of this parameter. If you are using a DB2 Connect gateway to access DRDA2 Application Servers, you should consider the effect indoubt transactions may have at the Application Servers even though there will be no interference with local data access. If there are no indoubt transactions, the performance impact will be minimal.
This parameter specifies the directory where the Sync Point Manager (SPM) logs are written. By default, the logs are written to the sqllib/spmlog directory, which, in a high-volume transaction environment, can cause an I/O bottleneck. Use this parameter to have the SPM log files placed on a faster disk than the current sqllib/spmlog directory. This allows for better concurrency among the SPM agents.
For more information on the Sync Point Manager, refer to the Installation and Configuration Supplement.
Refer to "Recovery of Indoubt Transactions on the Host" in the Administration Guide: Planning for more information on recovery of indoubt DRDA transactions.
This parameter identifies the name of the Sync Point Manager (SPM) instance to the database manager.
For more information on the Sync Point Manager, refer to the Installation and Configuration Supplement.
Refer to "Recovery of Indoubt Transactions on the Host" in the Administration Guide: Planning for more information on recovery of indoubt DRDA transactions.
This parameter identifies the Sync Point Manager (SPM) log file size in 4 KB pages. The log file is contained in the spmlog sub-directory under sqllib and is created the first time SPM is started.
For more information on the Sync Point Manager, refer to the Installation and Configuration Supplement.
Refer to "Recovery of Indoubt Transactions on the Host" in the Administration Guide: Planning for more information on recovery of indoubt DRDA transactions.
Recommendation: The Sync Point Manager log file size should be large enough to maintain performance, but small enough to prevent wasted space. The size required depends on the number of transactions using protected conversations, and how often COMMIT or ROLLBACK is issued.
To change the size of the SPM log file:
This parameter identifies the number of agents that can simultaneously perform resync operations.
Refer to "Recovery of Indoubt Transactions on the Host" in the Administration Guide: Planning for more information on recovery of indoubt DRDA transactions.
For more information on the Sync Point Manager, refer to the Installation and Configuration Supplement.