---------------------------------------------------------- Logmode Considerations and General Archiving Information ---------------------------------------------------------- By Ron Castelletto IBM Canada Laboratory Ltd. Toronto, Canada February 1995 (c) Copyright IBM Corp 1995 SQL/DS is a trademark of International Business Machines Corp. -------- CONTENTS -------- 1. LOGMODE CONSIDERATIONS AND GENERAL ARCHIVING INFORMATION 1.1 CHECKPOINT OVERVIEW 1.2 LOGMODES 1.2.1 LOGMODE N 1.2.2 LOGMODE Y 1.2.3 LOGMODE A 1.2.4 LOGMODE L 2. HISTORY AREA 3. DUAL LOGGING 4. WHAT NOT TO DO 5. GOOD ARCHIVING TIPS AND PROCEDURES 5.1 ADVANTAGES OF LOGMODE L OVER LOGMODE A 5.2 ADVANTAGES OF LOGMODE A OVER LOGMODE L 6. BIBLIOGRAPHY ----------------------------------------------------------- 1. LOGMODE CONSIDERATIONS AND GENERAL ARCHIVING INFORMATION ----------------------------------------------------------- This paper presents an overview of the logmodes available in SQL/DS, followed by a discussion of the reasons to use each logmode and some points to consider when running in these logmodes. For each logmode, there is an explana- tion of what to do in the event of a soft crash or in the event of a hard crash. As an introduction to the concept of logmodes there is an overview of the checkpoint process. A good understanding of the checkpoint process is required to understand how recovery is performed. Additionally, there are follow-on sections describing the History Area and Dual Logging. Finally there is a section on what not to do for archive/recovery procedures and a section on good ideas for archive/recovery. ----------------------- 1.1 CHECKPOINT OVERVIEW ----------------------- This section describes what checkpoints are and why they are required. Among other information, the SQL/DS direc- tory contains 2 bitmaps. The shadow bitmap, which is also known as the checkpoint version of the database, indicates which data and index pages were written to DASD during the most recent checkpoint. The shadow bitmap is not updated until the next checkpoint. The other bitmap, called the current bitmap, points to the up-to-date data and index pages that have been updated since the most recent checkpoint. When a checkpoint is taken, the current pages (those which have changed since the most recent checkpoint) are written to DASD. As part of the checkpoint process, the shadow bitmap is updated to reflect this. Immediately after the checkpoint the shadow and current bitmaps are equal. If the database abends abnormally, some recent updates to data pages have been logged, but the updated pages which were not yet written to DASD have been lost. The directory on DASD, which contains the shadow bitmap, is used for recovery. To recover, SQL/DS starts with the checkpoint version of the database and applies the changes logged since the checkpoint. This section describes how the log is interconnected with the checkpoint process. As users make updates to the database, all data required to redo or undo these updates is written to the log. Note: All data written to the log is immediately written to the log disk, otherwise, log data in memory would be lost if an abend or other failure occurred. Data and index pages that have been updated are kept in the page buffers. Page buffers are kept in memory for as long as possible, however, if many pages are updated and not all can be kept in the page buffers, the oldest pages in the page buffers are written to DASD. These changes are reflected in the current bitmap (since the current bitmap is always up to date), but are not reflected in the shadow bitmap (this is only updated after a checkpoint). When a checkpoint is taken, a checkpoint record is written to the log. Checkpoints are useful because they are a starting point for recovery if there is a soft crash (a soft crash can be a power failure, an internal processing error or system abend). Also note that a checkpoint does not wait for LUWs to be committed. A checkpoint may occur in the middle of an LUW. This means that the database is usually not in a consistent state when a checkpoint is taken. ------------ 1.2 LOGMODES ------------ SQL/DS provides 4 logmodes. Logmodes determine how the log is used for recovery, if archiving will be done, and if so, what type of archiving will be done. --------------- 1.2.1 LOGMODE N --------------- In logmode N, there is no logging of data changes. This logmode is available only in single user mode. During an LUW, the shadow bitmap contains the state of the database as of the beginning of the LUW. Meanwhile the current bitmap contains the state of the database including all updates made by current LUW. If a rollback is done then the current bitmap is reset to the shadow bitmap, which brings the database back to the state it was in at the beginning of the LUW. If Commit Work is issued, then a checkpoint is taken to write the changes to DASD, which synchronizes the current and shadow bitmaps. Logmode N is mainly used for large dataload that would otherwise fill the log disk. Changing to logmode N ends the current restore set. After switching to logmode N, you can only restart the database in logmode Y and need to take a database archive to establish a new restore set. After taking this database archive you can start in logmode A or logmode L. If there is a soft crash, the database is restored from DASD which contains the shadow bitmap. Since there is no log data, the LUW that was running at the time of the crash cannot be recovered. If there is a hard crash, you must recover from a pre- vious archive. The procedure used to recover from a pre- vious archive depends on the logmode in use prior to the switch to logmode N; refer to one of the following sections for appropriate recovery procedures. However, in all cases, any changes made in logmode N have not been logged and must be redone entirely after restoring from an archive. --------------- 1.2.2 LOGMODE Y --------------- In logmode Y, all changes to the database are logged and checkpoints are taken periodically (based on the CHKINTVL startup parameter). Logmode Y allows recovery from soft crashes (power failure, software abend) but not from hard crash (DASD failure). If there is a soft crash, the database on DASD is the checkpoint version of the database. Since a checkpoint may be taken in middle of an LUW, the database may be in an inconsistent state. Log data is used to bring the database to a consistent state. Refer to SQL/DS System Administration, Release 3.4.0, pg. 186 for VM, and pg.122 for VSE. SQL/DS uses log data to complete LUWs that were committed between the checkpoint and the soft crash (REDO processing). Also, log data is used to rollback LUWs that had been written to DASD by checkpoint, but were not completed or were rolled back (UNDO processing). In logmode Y, only log data required to recover from a soft crash is saved in the log. If there is a hard crash (DASD failure) the only avail- able option is to restore from a previous database archive (if there is one). Note: no log data can be applied to the database archive, since log data from the time of the database archive was not saved. In logmode Y, after restoring from a database archive, you do not have all the data since the archive saved in the log, you only have log data since the last checkpoint. In all logmodes, after a database archive is restored SQL/DS will try to apply any data contained in the log. In logmode Y, this log data does not logically apply to the restored database. If SQL/DS tries to apply the log data, most likely SQL/DS will encounter an internal system error and the database will abend. To prevent this error and to allow the database to start up after a database restore, you must perform a COLDLOG prior to the database restore. A COLDLOG will wipe out all log data from the log, so SQL/DS will not have any log data to try to apply to the database. --------------- 1.2.3 LOGMODE A --------------- In logmode A, all changes to the database are logged. Checkpoints are taken periodically (based on the CHKINTVL startup parameter). You must take an initial database archive to be able to start SQL/DS in logmode A. Once in logmode A, log data since the previous database archive will be saved until a subsequent database archive is taken. When the user requests a database archive or when the log is ARCHPCT full, a database archive is taken. This data- base archive establishes a new starting point for saving log data. Once the database archive has been taken, the log data accumulated from the previous archive is no longer necessary and will be eventually overwritten. Log data since this new archive will be saved until the next database archive is taken. Online database archives do not wait for LUWs to com- plete. For an online database archive, SQL/DS takes a pre-archive checkpoint (which may be taken in the middle of an LUW). Once the checkpoint is complete, SQL/DS starts taking the database archive. SQL/DS uses the shadow bitmap to take the database archive. Recall that the shadow bitmap is not changed until a subsequent checkpoint is taken. Since we do not want the shadow bitmap to be changed while the archive is proceeding, further checkpoints are inhibited until the archive is completed. At the completion of the archive, a post- archive checkpoint is taken. While the archive is being taken, LUWs are allowed to proceed. Updates made while the archive is being written are reflected in the current bitmap. In this manner, users may continue their proc- essing while an online archive is taken. If there is a soft crash while running in logmode A, recovery is the same as for logmode Y. On DASD, we have the checkpoint version of the database. Log data is used to do the appropriate UNDO and REDO operations. After UNDO/REDO, the database is in a consistent state; all committed LUWs have been restored and all partially com- pleted LUWs have been rolled back. If there is a hard crash, you must recover from a data- base archive. If you recover from the most recent data- base archive, you may use the current log to recover the database to the point of failure . because all changes made since that database archive have been saved in the log disk. After the database is restored, SQL/DS uses the log data to redo all changes that are recorded in the log. The database will be completely up to date with all completed LUWs redone, and any LUWs that were partially completed at the time of failure undone. If there is a problem with the most recent database archive (the tape is corrupted/unreadable, or there is a tape missing), you may recover from a backlevel database archive. If you recover from a backlevel database archive, you cannot apply any log data to the database archive since log data is only saved since the most recent database archive. As mentioned earlier, SQL/DS will always try to apply log data after a database restore. In this case, the log data does not logically apply to the backlevel database archive you must perform a COLDLOG before the restore. --------------- 1.2.4 LOGMODE L --------------- In logmode L, all changes to the database are logged. Checkpoints are taken periodically (based on the CHKINTVL startup parameter) You must take an initial database archive to start in logmode L. This logmode is similar to logmode A in that all log data since the most recent archive is saved in the log. When the user requests a log archive or when the log reaches ARCHPCT full, the log is archived (not the entire database). Once the log is archived, the data on the log disk can be overwritten. SQL/DS will not overwrite any log data since the most recent log archive. If a database archive is requested, all log data currently on the log disk must first be archived. This log data is needed in case a backlevel database archive is restored. Once the current log is archived SQL/DS will begin taking the database archive, By taking the log archive before the database archive, SQL/DS ensures that log archives are LUW consistent (ie: LUWs cannot span across log archives). In LOGMODE L, all LUWs must complete before a log archive can be taken. Therefore, at the time of a log archive, the database is in a consistent state. Since database archives are always preceded by a log archive, database archives in logmode L are also in a consistent state. If there is a soft crash, recovery is the same as in logmode Y and logmode A. If there is a hard crash, you can restore from the most recent database archive. At the time of the hard crash there will be data on the current log that has not been archived. To recover completely, SQL/DS must restore the most recent database archive, then apply all log archives taken since the database archive and then apply the current log. After the database archive is restored, SQL/DS will archive the current log. This is because any log archives must be copied to the log disk to be applied and if the current log isn't archived before this is done, it will be lost. The archive of the current log becomes the last log archive in the restore set. After the current log is archived, the operator is prompted to restore the first log archive taken after the database archive. When this log archive has been applied, the next log archive is applied. This process continues until all log archives, including the archive of the current log, have been applied. At this point the data- base is in exactly the state it was in when the crash occurred. If there is a problem with the most recent database archive, you may restore from a backlevel archive. Since all log data is archived, it is possible to restore all the log archives taken from the backlevel database archive up to the just completed current log archive. So in logmode L, you can bring your database completely up to date from a backlevel archive. --------------- 2. HISTORY AREA --------------- How does SQL/DS know which database archive has been restored ? How does SQL/DS know which log archives need to be applied? In the log disk, SQL/DS sets aside one page to store history information. When an archive is taken, when a coldlog is done and when the logmode is switched, an entry is added to the history area. When a restore is required, SQL/DS first restores the database archive, and then based on the timestamp contained in the database archive, SQL/DS looks for this database archive entry in the history area. Since the history area is chronological, SQL/DS can determine if any log archives have been taken after the database archive, and so can determine if these log archives should be part of the restore set. Refer to SQL/DS System Administration, Special Topics in Recovery Design, (Chapter 10 for VM or Chapter 9 for VSE) for a discussion of the history area. --------------- 3. DUAL LOGGING --------------- What if there is a DASD error on the log disk itself ? As mentioned earlier, log data is required to restore from both soft and hard crashes. If the log cannot be read after a soft crash, you must do a COLDLOG to erase the log data. Your database will be in the state it was in at the time of the last checkpoint, which is not nec- essarily consistent. The other alternative is to restore from the most recent database archive (if you have one). However, since the current log has been damaged, you will not be able to apply the current log. To protect your database from a DASD error on the log disk, you should use dual logging. Dual logging means SQL/DS will manage two copies of the log. If there is a DASD error on one log disk, you will be notified and SQL/DS will automat- ically switch to the other log. Dual logging is HIGHLY recommended. The log disks should be physically separated to avoid the possibility of both logs being damaged simultaneously. ----------------- 4. WHAT NOT TO DO ----------------- The following points are to be avoided. o Running in logmode Y and not having a database archive is a bad idea. If there is a DASD failure, you will have to regenerate the database before you can start to reload data. o Do not take a user archive or system backup while the database is running. The following example illus- trates the type of problem that results. Let's assume I use DDR to backup my directory and all dbextents every night starting at 9:00PM. I first backup the directory disk. The shadow bitmap is in the direc- tory. This process takes 15 minutes. At 9:15, I start to backup my dbextents, this process takes 2 hours. Meanwhile, the database is running and users are making updates to the database. If I use DDR to restore the directory and dbextents, the shadow bitmap expects all pages to exists as they were at 9:00PM (the time the shadow bitmap was archived). However, if pages were updated and written to DASD between 9:00PM and the time the page was archived using DDR, there is an inconsistency between the shadow bitmap and the actual data pages. This will likely cause internal processing errors during startup or subsequent processing. To ensure a DDR backup will restore successfully you must ensure there is no activity on the database while the backup is running. The best way to do this is to shutdown the database. o Continuing with the above example, if you shutdown the database to take a system backup, you should shutdown with SQLEND UARCHIVE. This lets SQL/DS know that a user archive is taking place and records this information in the history area. On the next startup, the operator is prompted if the user archive was successful. In this way, SQL/DS will manage the log disk and log history area appropriately for the logmode you are running. o Be careful when doing a COLDLOG; understand the dif- ference between Reformat and Reconfigure. After Reconfigure, you lose your history area and, there- fore, the ability to apply any log archives. o If you start a restore and choose to temporarily interrupt the restore, by entering STOP SYSTEM, you must restart with STARTUP=W. With STARUP=W, SQL/DS will recognize that you interrupted a restore, and will continue where you left off. If you restart with STARTUP=R or STARTUP=U, SQL/DS will be expecting you to restore from a database archive. o In logmode L, SQL/DS has the ability to restore all log archives taken, including the current log disk, after a backlevel database archive is restored. In logmode L, if you do a coldlog before restoring, you lose the ability to bring your database completely up to date since the COLDLOG erases the current log. o Follow-on from the previous point: In logmode L, you should take database archives regularly to reduce the size of the restore set in the event that you need to restore. If you have a long series of log archives following a database archive, the time to restore all the log archives may be considerable. A general rule of thumb is to take a database archive weekly (usually offshift) and to take log archives nightly. o Take care of your archive tapes! You may consider making backup copies of your archive tapes and keeping these offsite. This makes it extremely impor- tant that you label your archive tapes clearly. Fol- lowing is an example of a real situation caused by an operator not familiar with SQL/DS archiving proce- dures. Running in logmode L, soon after a database archive was taken the operator asked for a log archive to be taken. Since there was very little log data, the archive completed in seconds. The operator was expecting the archive to take longer, and assumed that something SQL/DS did not in fact take the archive, so asked for another log archive. SQL/DS then took another log archive but the operator never switched tapes !! The second log archive overwrote the first log archive. Later in the week, the data- base had to be restored due to a DASD failure. The database archives was restored successfully, and SQL/DS prompted for the first log archive. The tape, which actually contained the second log archive was mounted. SQL/DS recognized that the wrong tape was loaded, and asked for the first log archive tape again. In this scenario the database can not be brought up to date. However, this particular story does have a happy ending. Since the first log archive was taken offshift, just a few hours after the database archive, and because the operator could determine that no users were logged on to the system during this time, SQL/DS Service and the Customer were convinced that the first log archive hadn't con- tained any data which meant that the second and fol- lowing log archives could logically be applied to the database. SQL/DS Service was able to patch the history to skip over the first log archive and all following log archives were applied successfully. Moral: be careful with your tapes, label them clearly, understand the archive and restore proce- dures. ------------------------------------- 5. GOOD ARCHIVING TIPS AND PROCEDURES ------------------------------------- The most important decision is which logmode to run in. Usually corporate databases run in either logmode A or L. ------------------------------------------ 5.1 ADVANTAGES OF LOGMODE L OVER LOGMODE A ------------------------------------------ o if the log fills during the business day. only require a log archive not a database archive. o can take log archive at end of business day to free up all log space for next day. In logmode A, you need enough log space to store all log data between data- base archives. Usually database archives are taken less frequently than log archives, so in Logmode A you would need a larger log. o if there is a problem with the most recent database archive, you can recover completely from a backlevel database archive. ------------------------------------------ 5.2 ADVANTAGES OF LOGMODE A OVER LOGMODE L ------------------------------------------ o It is usually faster to restore since you only need to apply the current log after a database restore, not a series of log archives. At least once a year: go over your archive and restore procedures. Take your most recent (and perhaps a back- level) database archive and restore it on a test system. If in LOGMODE L, restore all log archives as well. Run some tests to ensure all the data was restored success- fully. In the event you have an emergency, it is invalu- able to have the experience of going through a restore scenario so you are familiar with the messages and how to respond to them. If you run in Logmode Y, you should take database archives periodically. Some customers have read-only databases. On daily or weekly basis, data is loaded into this database from an external source, so they think that if there is ever a DASD failure, they can simply reload their data from the external source. However, for SQL/DS to recover, it needs a copy of the directory definitions from database generation to know the structure of the database. Without an archive, you would have to regen- erate the database, re-create all tables, views and indexes as well as grant authorities, With an archive, you can restore the archive, and then run local proce- dures to reload any data that may have been lost. --------------- 6. BIBLIOGRAPHY --------------- SQL/DS System Administration for IBM VM Systems, Version 3 Release 4, Chapter 9 and 10 SQL/DS System Administration for VSE, Version 3 Release 4, Chapter 8 and 9 SQL/DS Managing the SQL/DS Environment for IBM VM Systems, Version 3 Release 3, Chapter 7