Resynchronization occurs if two-phase commit processing is interrupted by a resource failure. A resource failure may be caused by a node failure, a session failure, a program failure or other problems by a protected resource manager. The resource failure may be between a sync point manager and local resource managers or sync point manager and remote resource managers.
Resynchronization is conducted independently for each failed protected resource for which it is required. Resynchronization has the following purposes:
After an LU failure, it is possible that the partner that is responsible for resync is unable to establish the resync conversation because the failed LU has not been restarted. The responsible LU retries the resync at implementation-defined intervals.
In order to reduce the delay for resynchronization after an LU is restarted, the partner LU may signal to the resync initiator that it is available by sending an Exchange Log Names GDS variable that is not accompanied by a Compare States GDS variable. 2 Once the responsible LU has received this signal that the failed LU is active, it can initiate resync, sending the Exchange Log Names and Compare States GDS variables.
Sending the Exchange Log Names GDS variable as a signal of LU availability need be done only once, no matter how many protected conversations require resynchronization between the two LUs. Also, if the LU that becomes available is responsible for initiating resync for some conversations, it need not send another Exchange Log Names GDS variable as a signal that the LU is available, since the partner SPM can infer that a partner is available from the other resyncs the partner SPM initiates.
To notify the CRR recovery server of its readiness to accept resynchronization communications, a DB2 Server for VM server performs Resynchronization Initialization with the CRR recovery server when it starts up 3 in multiple user mode. This involves sending an exchange log names request to the CRR recovery server before the first sync point is processed. The CRR recovery server sends an exchange log names reply. Then DB2 Server for VM and the recovery server save each other's log name, LU name and TPN to use in later validations. The current values for these can be determined using the database manager's SHOW CRR LOGNAMES command. For more information, see the DB2 Server for VSE & VM Operation manual.
As part of the exchange log names request, DB2 Server for VM sends a log name. The DB2 Server for VM log name is the concatenation of the following information (with blank characters stripped off):
The database manager also sends a log status in the exchange log names request. Log status can have one of the following values:
The database manager also send a TPN value in the exchange log names request. The TPN value is the RESID for the database manager.
The CRR recovery server receives the exchange log names request, processes it, and returns an Exchange Log Names reply back to the database manager. This reply also contains a log status, which can have one of the folowing values:
The database manager receives the exchange log names reply from the CRR
recovery server. The resulting action for DB2 Server for VM is
described in the following table:
Table 33. Actions by DB2 Server for VM on Exchange Log Names reply.
When DB2 Server for VM receives an Exchange Log Names reply from the VMCRR recovery server, the database manager needs to do certain actions. This table summarizes these required actions. | |||
CRR Recovery Server's Log Status | Reply from CRR | DB2 Server for VM's log status | DB2 Server for VM's action |
---|---|---|---|
COLD or WARM | NORMAL | COLD | The database manager saves or updates the recovery server's locally known LU name, fully qualified LU name, TPN, and log name in its log and sends an explicit APPC confirmation to the recovery server. |
COLD | NORMAL | WARM | The database manager checks the work unit records in its log that relate
to the recovery server.
|
WARM | NORMAL | WARM | The database manager compares the recovery server's log name sent in
the reply with the name in DB2 Server for VM's log:
|
WARM | ABNORMAL | COLD or WARM | The database manager issues the following messages (the first one being
equivalent to CMS message 3371E) on its operator console and does a deallocate
(abend):
ARI0179E An Exchange Log Name's Reply sent by CRR recovery server at TPN tpn contained an error status. ARI0176I The SYNCPNT parameter has been reset to N. The SYNCPOINT parameter value is set to N because the database manager's participation in sync points must be delayed until the error condition is resolved. The DB2 Server for VM operator should contact the recovery server operator to determine the reason for the error. If the problem is a log name mismatch, one of the partners might be using the wrong log, and should be restarted with the correct log. If the correct log cannot be supplied, both partners must be coldstarted. Note that the "RESET CRR LOGNAMES" operator command may be used to reset the CRR recovery server's luname, tpn and log name and force an DB2 Server for VM log status of cold. |
The CRR recovery server initiates the resynchronization recovery function to ensure consistent completion of the sync point by all registered resources for which data was logged. Using information stored in its log, the CRR recovery server determines which resources managers (for example, DB2 Server for VM) should be included in the recovery and allocates APPC conversations with them.
To allocate an APPC conversation with DB2 Server for VM, the CRR recovery server uses information that DB2 Server for VM provided when it registered with the SPM (using the CSL routine DMSREG).
The resynchronization recovery transaction between the CRR recovery server and the database manager consists of two functions:
The CRR recovery server initiates this exchange with DB2 Server for VM to ensure that the data they saved from resynchronization initialization are still valid.
The CRR recovery server initiates this exchange with DB2 Server for VM to compare the state of the CRR logical unit of work with the state of the database manager's logical unit of work.
During resynchronization recovery, DB2 Server for VM receives the Exchange
Log Names and Compare states requests. First, the database manager
processes the Exchange Log Names request as described in the following
figure.
Table 34. Actions by DB2 Server for VM on Exchange Log Names request.
When the database manager receives an Exchange log Names request from the VMCRR recovery server, it needs to do certain actions in order to formulate a reply. This table summarizes these required actions. | ||||
CRR Recovery Server's Log Status | DB2 Server for VM's log status | DB2 Server for VM's actions | ||
---|---|---|---|---|
WARM | COLD | The database manager holds the recovery server's log name from the
request but does not update its own log or process the compare states
request. It then sends an Exchange Log Names reply to the recovery
server indicating cold log status and normal completion of the request.
DB2 Server for VM waits for indication of a deallocate (abend server, then
does a deallocate (normal).
Note that if the RESET CRR LOGNAMES command is issued, then the log status at the database will be COLD. | ||
WARM | WARM | The database manager compares the recovery server's log name in the
request with the name that is saved in its log. DB2 Server for VM also
validates its own log name specified in the request.
| ||
The database manager will determine its log status by comparing the CRR Recovery Server's LUNAME and TPN passed in the request with that which was recorded in its log. If there is any mismatch then the log status is deemed to be COLD. |
If the log name exchange was satisfactory, the database manager then
processes the Compare States request as shown in Table 35.
Table 35. Actions by DB2 Server for VM on Compare States request
When the database manager receives a compare states request from the VMCRR recovery server, it needs to do certain actions depending on the state of the LUW at DB2 Server for VM and at the recovery server. This table summarizes these required actions. | ||||
LUW state at DB2 Server for VM | LUWID state sent by CRR recovery server | |||
---|---|---|---|---|
Backout | Committed | |||
LUWID Not found | Send normal completion reply indicating Backout state. DB2 Server
for VM notifies operator with message:
ARI0183E The Sync Point Manager has asked to ROLLBACK this LUW but the database manager has no memory of it. | Send normal completion reply indicating Backout state. DB2 Server
for VM notifies operator with message:
ARI0183E The Sync Point Manager has asked to COMMIT this LUW but the database manager has no memory of it. | ||
Indoubt (Prepared) | Drive backout of resource and send normal completion reply indicating
Backout state.
DB2 Server for VM indicates backout to the operator with the message:
ARI0230I FORCE ROLLBACK with disable scheduled for agent user id because of resynchronization recovery. | Drive commit of resource and send normal completion reply indicating
Committed state.
DB2 Server for VM indicates committed to the operator with the message: ARI0230I FORCE COMMIT with disable scheduled for agent user id because of resynchronization recovery. | ||
Heuristic Backout | Send normal completion reply indicating Heuristic Backout State | Send normal completion reply indicating Heuristic Backout State.
DB2 Server for VM notifies operator with message:
ARI0184A The Sync Point Manager has asked to COMMIT this LUW but the FORCE command was previously used to ROLLBACK it. In this case, the LUW will still appear when the SHOW INDOUBT command is executed. The LUW must be cleared using the RESET INDOUBT command. In addition, manual intervention is necessary to ensure that the LUW is in a consistent state at all sites where the LUW has been distributed. This may require intervention at this database manager, or possibly at another database manager. Manual intervention could mean manually fixing the data or possibly restoring an archive. | ||
Heuristic Committed | Send normal completion reply indicating Heuristic Committed state.
DB2 Server for VM notifies operator with message:
ARI0184A The Sync Point Manager has asked to ROLLBACK this LUW but the FORCE command was previously used to COMMIT it. In this case, the LUW will still appear when the SHOW INDOUBT command is executed. The LUW must be cleared using the RESET INDOUBT command. In addition, manual intervention is necessary to ensure that the LUW is in a consistent state at all sites where the LUW has been distributed. This may require intervention at this database manager, or possibly at another database manager. Manual intervention could mean manually fixing the data or possibly restoring an archive. | Send normal completion reply indicating Heuristic Committed state. | ||
|
When DB2 Server for VM is performing resynchronization initialization with the CRR recovery server, its status may be seen by the SHOW CONNECT command. The following message is displayed:
Recovery Agent is processing Resynchronization Initialization
If the database manager is in a communications wait, waiting for the CRR recovery server to reply to its exchange log names request, the following message is displayed:
Recovery Agent is processing Resynchronization Initialization and is in a communications wait with the CRR Recovery Server
When the CRR recovery server is performing resynchronization recovery with DB2 Server for VM its status may be seen by the SHOW CONNECT command. The following message is displayed:
Recovery Agent is processing Resynchronization Recovery
If the database manager is in a communications wait, waiting for the CRR recovery server to acknowledge its exchange log names and compare states replies, following message is displayed:
Recovery Agent is processing Resynchronization Recovery and is in a communications wait with the CRR Recovery Server
If the database manager is committing or rolling back the logical unit of work requested by the CRR recovery server, the following message is displayed:
Recovery Agent is processing Resynchronization Recovery and is waiting for a <commit|rollback> to complete.
This information is available in tokenized format. 4
During resynchronization initialization or resynchronization recovery, the database manager could wait indefinitely for a response from the CRR recovery server. If this is the case, the following command may be used to terminate resynchronization initialization processing:
>>-FORCE----+-RINIT-+------------------------------------------>< '-RREC--' |
If "FORCE RINIT" is entered, resynchronization initialization processing is terminated and the SYNCPNT parameter is changed from Y to N. If "FORCE RREC" is entered, resynchronization recovery is terminated and deallocate (abend) is performed to terminate the conversation with the CRR recovery server.
Notes:
ARI0225E System operator must issue SHOW ACTIVE, SHOW CONNECT or SHOW SYSTEM command prior to FORCE command.
and FORCE processing terminates.
ARI2040E FORCE RINIT may only be entered when Resychronization Initialization is active.
and FORCE processing terminates.
ARI2040E FORCE RREC may only be entered when Resychronization Recovery is active.
and FORCE processing terminates.
ARI2041E FORCE RINIT is already scheduled.
and FORCE processing terminates. (Note that the scheduled FORCE command remains!)
ARI2041E FORCE RREC is already scheduled.
and FORCE processing terminates. (Note that the scheduled FORCE command remains!)
ARI0229E Too many FORCE command input parameters
and FORCE processing terminates.