DB2 Server for VM: System Administration


Resynchronization

Resynchronization occurs if two-phase commit processing is interrupted by a resource failure. A resource failure may be caused by a node failure, a session failure, a program failure or other problems by a protected resource manager. The resource failure may be between a sync point manager and local resource managers or sync point manager and remote resource managers.

Resynchronization is conducted independently for each failed protected resource for which it is required. Resynchronization has the following purposes:

Resync When Partner is Not Active

After an LU failure, it is possible that the partner that is responsible for resync is unable to establish the resync conversation because the failed LU has not been restarted. The responsible LU retries the resync at implementation-defined intervals.

In order to reduce the delay for resynchronization after an LU is restarted, the partner LU may signal to the resync initiator that it is available by sending an Exchange Log Names GDS variable that is not accompanied by a Compare States GDS variable. 2 Once the responsible LU has received this signal that the failed LU is active, it can initiate resync, sending the Exchange Log Names and Compare States GDS variables.

Sending the Exchange Log Names GDS variable as a signal of LU availability need be done only once, no matter how many protected conversations require resynchronization between the two LUs. Also, if the LU that becomes available is responsible for initiating resync for some conversations, it need not send another Exchange Log Names GDS variable as a signal that the LU is available, since the partner SPM can infer that a partner is available from the other resyncs the partner SPM initiates.

Resynchronization Initialization

To notify the CRR recovery server of its readiness to accept resynchronization communications, a DB2 Server for VM server performs Resynchronization Initialization with the CRR recovery server when it starts up 3 in multiple user mode. This involves sending an exchange log names request to the CRR recovery server before the first sync point is processed. The CRR recovery server sends an exchange log names reply. Then DB2 Server for VM and the recovery server save each other's log name, LU name and TPN to use in later validations. The current values for these can be determined using the database manager's SHOW CRR LOGNAMES command. For more information, see the DB2 Server for VSE & VM Operation manual.

As part of the exchange log names request, DB2 Server for VM sends a log name. The DB2 Server for VM log name is the concatenation of the following information (with blank characters stripped off):

The database manager also sends a log status in the exchange log names request. Log status can have one of the following values:

warm
When the database manager starts up, it invokes the DMSGETRS CSL routine to determine the current TPN of the CRR recovery server. If the TPN value returned by the DMSGETRS routine matches the TPN value in the database manager's log, the log status is warm.

cold
When the database manager starts up, if it determines that the current TPN value of the CRR recovery server does NOT match the value of the CRR recovery server's TPN as stored in the database manager's log, the log status is cold.

The database manager also send a TPN value in the exchange log names request. The TPN value is the RESID for the database manager.

The CRR recovery server receives the exchange log names request, processes it, and returns an Exchange Log Names reply back to the database manager. This reply also contains a log status, which can have one of the folowing values:

warm
The CRR recovery server compares the TPN value that DB2 Server for VM sent in the exchange log names request with the TPN value stored in the CRR recover server's log. If the values match, then the log status is warm.

cold
If the TPN values sent by DB2 Server for VM in the exchange log names request does not match the TPN value stored in the CRR recovery server's log, the log status is cold.

The database manager receives the exchange log names reply from the CRR recovery server. The resulting action for DB2 Server for VM is described in the following table:

Table 33. Actions by DB2 Server for VM on Exchange Log Names reply.
When DB2 Server for VM receives an Exchange Log Names reply from the VMCRR recovery server, the database manager needs to do certain actions. This table summarizes these required actions.
CRR Recovery Server's Log Status Reply from CRR DB2 Server for VM's log status DB2 Server for VM's action
COLD or WARM NORMAL COLD The database manager saves or updates the recovery server's locally known LU name, fully qualified LU name, TPN, and log name in its log and sends an explicit APPC confirmation to the recovery server.
COLD NORMAL WARM The database manager checks the work unit records in its log that relate to the recovery server.
  • If no DRDA2 in-doubt logical units of work are found, DB2 Server for VM updates the recovery server's log name saved in the log and sends an explicit APPC confirmation to the recovery server.
  • If DRDA2 in-doubt logical units of work are found, DB2 Server for VM issues the following messages (the first one being equivalent to CMS message 3373E) on its operator console and does a deallocate (abend):
     ARI0177E CRR recovery server at TPN tpn has
              provided a new log name resulting from a cold 
              start. Some LUWID(s) cannot be automatically 
              resolved by resynchronization.
     ARI0176I The SYNCPNT parameter has been reset to N.
    

    The database manager's participation in sync points must be delayed until the error condition is resolved. The DB2 Server for VM operator should contact the recovery server operator to determine the reason for the status mismatch. The operator might have to manually force some units of work using the FORCE operator command.

WARM NORMAL WARM The database manager compares the recovery server's log name sent in the reply with the name in DB2 Server for VM's log:
  • If the names match, DB2 Server for VM sends an explicit APPC confirmation to the recovery server.
  • If the log names do not match, DB2 Server for VM issues the following messages (the first one being equivalent to CMS message 3372E) on its operator console and does a deallocate (abend):
     ARI0178E An Exchange Log Name's Reply sent by CRR 
              recovery server at TPN tpn contained a 
              log name which does not match the current
              {database manager|CRR recovery server} 
              log name.
                Log name in Reply: log name
                Current Log name:  log name
     ARI0176I The SYNCPNT parameter has been reset to N.
    

    The database manager's participation in sync points must be delayed until the error condition is resolved. The DB2 Server for VM operator should contact the recovery server operator to determine the reason for the log name mismatch. The database may have been restored from an archive which resulted in the log name mismatch, or the recovery server might be using the wrong log, and should be restarted with the correct log. If the correct log cannot be supplied, both partners must be coldstarted. Note that the "RESET CRR LOGNAMES" operator command may be used to reset the CRR recovery server's luname, tpn and log name and force an DB2 Server for VM log status of cold.

WARM ABNORMAL COLD or WARM The database manager issues the following messages (the first one being equivalent to CMS message 3371E) on its operator console and does a deallocate (abend):
   ARI0179E An Exchange Log Name's Reply sent by CRR 
            recovery server at TPN tpn 
            contained an error status.
   ARI0176I The SYNCPNT parameter has been reset to N.

The SYNCPOINT parameter value is set to N because the database manager's participation in sync points must be delayed until the error condition is resolved. The DB2 Server for VM operator should contact the recovery server operator to determine the reason for the error.

If the problem is a log name mismatch, one of the partners might be using the wrong log, and should be restarted with the correct log. If the correct log cannot be supplied, both partners must be coldstarted. Note that the "RESET CRR LOGNAMES" operator command may be used to reset the CRR recovery server's luname, tpn and log name and force an DB2 Server for VM log status of cold.

Resynchronization Recovery

The CRR recovery server initiates the resynchronization recovery function to ensure consistent completion of the sync point by all registered resources for which data was logged. Using information stored in its log, the CRR recovery server determines which resources managers (for example, DB2 Server for VM) should be included in the recovery and allocates APPC conversations with them.

To allocate an APPC conversation with DB2 Server for VM, the CRR recovery server uses information that DB2 Server for VM provided when it registered with the SPM (using the CSL routine DMSREG).

The resynchronization recovery transaction between the CRR recovery server and the database manager consists of two functions:

During resynchronization recovery, DB2 Server for VM receives the Exchange Log Names and Compare states requests. First, the database manager processes the Exchange Log Names request as described in the following figure.

Table 34. Actions by DB2 Server for VM on Exchange Log Names request.
When the database manager receives an Exchange log Names request from the VMCRR recovery server, it needs to do certain actions in order to formulate a reply. This table summarizes these required actions.
CRR Recovery Server's Log Status DB2 Server for VM's log status DB2 Server for VM's actions
WARM COLD The database manager holds the recovery server's log name from the request but does not update its own log or process the compare states request. It then sends an Exchange Log Names reply to the recovery server indicating cold log status and normal completion of the request. DB2 Server for VM waits for indication of a deallocate (abend server, then does a deallocate (normal).

Note that if the RESET CRR LOGNAMES command is issued, then the log status at the database will be COLD.

WARM WARM The database manager compares the recovery server's log name in the request with the name that is saved in its log. DB2 Server for VM also validates its own log name specified in the request.
  • If the log names match, DB2 Server for VM formulates (but doe Exchange Log Names reply indicating normal completion of the request. It then processes the compare states request. After DB2 Server for VM completes the Compare states processi replies in the same buffer.
  • If the log names do not match, DB2 Server for VM issues the f (equivalent to CMS message 3372E) on its operator console and sends an Exchange Log Names reply to the recovery server indicating abnormal completion of the request:
     ARI0178E An Exchange Log Name's Request sent by CRR 
              recovery server at TPN tpn contained
              a log name which does not match the current
              {database manager|CRR recovery server} 
              log name.
                Log name in Request: log name
                Current Log name: log name
    
    The database manager does not process the compare states request. It waits for indication of a deallocate (abend) by the recovery server, then does a deallocate (normal).

    The DB2 Server for VM operator should contact the recovery server operator to determine the reason for the log name mismatch. A couple of possible reasons are:

    • The database manager's NETID, LUNAME or TPN may have changed, which resulted in a different log name.
    • An archive from another DB2 Server for VM database may have b in this database.

    If this is the case then the database must be coldstarted. Note that the "RESET CRR LOGNAMES" operator command may be used to reset the CRR recovery server's luname, tpn and log name and force a DB2 Server for VM log status of cold. If the recovery server is using the wrong log and cannot locate the correct log, DB2 Server for VM might have to manually force some units of

Note:

The database manager will determine its log status by comparing the CRR Recovery Server's LUNAME and TPN passed in the request with that which was recorded in its log. If there is any mismatch then the log status is deemed to be COLD.

If the log name exchange was satisfactory, the database manager then processes the Compare States request as shown in Table 35.

Table 35. Actions by DB2 Server for VM on Compare States request
When the database manager receives a compare states request from the VMCRR recovery server, it needs to do certain actions depending on the state of the LUW at DB2 Server for VM and at the recovery server. This table summarizes these required actions.
LUW state at DB2 Server for VM LUWID state sent by CRR recovery server
Backout Committed
LUWID Not found Send normal completion reply indicating Backout state. DB2 Server for VM notifies operator with message:
ARI0183E The Sync Point Manager has asked 
         to ROLLBACK 
         this LUW but the database manager 
         has no memory of it.
Send normal completion reply indicating Backout state. DB2 Server for VM notifies operator with message:
ARI0183E The Sync Point Manager has asked 
         to COMMIT
         this LUW but the database manager 
         has no memory of it.
Indoubt (Prepared) Drive backout of resource and send normal completion reply indicating Backout state.

DB2 Server for VM indicates backout to the operator with the message:

ARI0230I FORCE ROLLBACK with disable
         scheduled for agent user id
         because of resynchronization
         recovery.
Drive commit of resource and send normal completion reply indicating Committed state.

DB2 Server for VM indicates committed to the operator with the message:

ARI0230I FORCE COMMIT with disable
         scheduled for agent user id
         because of resynchronization
         recovery.
Heuristic Backout Send normal completion reply indicating Heuristic Backout State Send normal completion reply indicating Heuristic Backout State. DB2 Server for VM notifies operator with message:
ARI0184A The Sync Point Manager has asked 
         to COMMIT this
         LUW but the FORCE command was 
         previously used to ROLLBACK it.

In this case, the LUW will still appear when the SHOW INDOUBT command is executed. The LUW must be cleared using the RESET INDOUBT command. In addition, manual intervention is necessary to ensure that the LUW is in a consistent state at all sites where the LUW has been distributed. This may require intervention at this database manager, or possibly at another database manager. Manual intervention could mean manually fixing the data or possibly restoring an archive.

Heuristic Committed Send normal completion reply indicating Heuristic Committed state. DB2 Server for VM notifies operator with message:
ARI0184A The Sync Point Manager has asked 
         to ROLLBACK this
         LUW but the FORCE command was 
         previously used to COMMIT it.

In this case, the LUW will still appear when the SHOW INDOUBT command is executed. The LUW must be cleared using the RESET INDOUBT command. In addition, manual intervention is necessary to ensure that the LUW is in a consistent state at all sites where the LUW has been distributed. This may require intervention at this database manager, or possibly at another database manager. Manual intervention could mean manually fixing the data or possibly restoring an archive.

Send normal completion reply indicating Heuristic Committed state.
Note:
  1. The state Syncpoint Pending is not possible at DB2 Server for VM servers. The server completes any sync point actions such as prepare to commit, commit or rollback before the CRR Recovery Server performs any sync point logging.
  2. The state Backout (Reset) is not possible at DB2 Server for VM servers. The servers complete rollback processing before the CRR Recovery Server performs any sync point logging for backout.
  3. The state committed is not possible at DB2 Server for VM servers. The servers complete commit processing before the CRR Recovery Server performs any sync point logging for committed.
  4. The state LUWID not found is possible if (for example) an older DB2 Server for VM archive without an in-doubt LUW is CRR log name, TPN and LUNAME recorded by the restored database matches the current CRR log name, TPN and LUNAME.

Displaying Resynchronization Status using the SHOW CONNECT Command

When DB2 Server for VM is performing resynchronization initialization with the CRR recovery server, its status may be seen by the SHOW CONNECT command. The following message is displayed:

  Recovery Agent is processing Resynchronization Initialization

If the database manager is in a communications wait, waiting for the CRR recovery server to reply to its exchange log names request, the following message is displayed:

  Recovery Agent is processing Resynchronization Initialization
  and is in a communications wait with the CRR Recovery Server

When the CRR recovery server is performing resynchronization recovery with DB2 Server for VM its status may be seen by the SHOW CONNECT command. The following message is displayed:

  Recovery Agent is processing Resynchronization Recovery

If the database manager is in a communications wait, waiting for the CRR recovery server to acknowledge its exchange log names and compare states replies, following message is displayed:

  Recovery Agent is processing Resynchronization Recovery
  and is in a communications wait with the CRR Recovery Server

If the database manager is committing or rolling back the logical unit of work requested by the CRR recovery server, the following message is displayed:

  Recovery Agent is processing Resynchronization Recovery
  and is waiting for a <commit|rollback> to complete.

This information is available in tokenized format. 4

Terminating Resynchronization using the FORCE Command

During resynchronization initialization or resynchronization recovery, the database manager could wait indefinitely for a response from the CRR recovery server. If this is the case, the following command may be used to terminate resynchronization initialization processing:



>>-FORCE----+-RINIT-+------------------------------------------><
            '-RREC--'
 

If "FORCE RINIT" is entered, resynchronization initialization processing is terminated and the SYNCPNT parameter is changed from Y to N. If "FORCE RREC" is entered, resynchronization recovery is terminated and deallocate (abend) is performed to terminate the conversation with the CRR recovery server.

Notes:

  1. The operator must issue the SHOW ACTIVE, SHOW CONNECT or SHOW SYSTEM command prior to the FORCE RINIT/RREC command. Otherwise, the following message is issued:
     ARI0225E System operator must issue SHOW ACTIVE, SHOW CONNECT or
              SHOW SYSTEM command prior to FORCE command.
    

    and FORCE processing terminates.

  2. If the database is not performing resynchronization initialization when the FORCE RINIT command is entered, then the following message is displayed:
    ARI2040E FORCE RINIT may only be entered when Resychronization Initialization is active.
    

    and FORCE processing terminates.

  3. If the database is not performing resynchronization recovery when the FORCE RREC command is entered, then the following message is displayed:
    ARI2040E FORCE RREC may only be entered when Resychronization Recovery is active.
    

    and FORCE processing terminates.

  4. If the FORCE RINIT command was already issued, the following message is displayed:
    ARI2041E FORCE RINIT is already scheduled.
    

    and FORCE processing terminates. (Note that the scheduled FORCE command remains!)

  5. If the FORCE RREC command was already issued, the following message is displayed:
    ARI2041E FORCE RREC is already scheduled.
    

    and FORCE processing terminates. (Note that the scheduled FORCE command remains!)

  6. If extra parameters are entered after "FORCE RINIT/RREC", then the following message is displayed:
     ARI0229E Too many FORCE command input parameters
    

    and FORCE processing terminates.


Footnotes:

2
The partner can tell that a Compare States GDS variable is not present because SPM's RECEIVE_AND_WAIT verb will complete with a WHAT_RECEIVED of SEND rather than DATA_COMPLETE.

3
Note - this is only done when the database is initializing. Once database initialization has completed, resynchronization initialization is not performed again until the database is brought down and then restarted.

4
For more information about tokenized format, see "Appendix A" of the Diagnosis Guide and Reference for IBM VM Systems manual.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]