Before you begin
To allow WebSphere Application Server for z/OS to restart on an alternate system, the following prerequisites must be installed on every system (your original system as well as any systems intended for recovery) before reconfiguring the ARM policies to enable peer restart and recovery. You must also make sure all of the systems, where you might need to perform restart, are part of the same RRS log group.
Installing the prerequisite service updates on all of these systems will not hinder your current running environment if you want to continue to only restart in place. However, if this service is not installed, there is a possibility that the controller will not be able to move back. OTS will attempt to restart on the alternate system and fail. If there are any URs that are unresolved with RRS once this happens, the controller will not be allowed to restart on the home system until RRS is cancelled on the alternate system. For more information on OTS and RRS, see z/OS MVS Programming: Resource Recovery.
If you do not plan to use peer restart and recovery, you do not need to abide by these functional prerequisites. Your system will instead use the restart-in-place function.
The following products all support RRS. Individually, they also support peer restart and recovery, providing the above prerequisites are all properly installed:
In addition to the preceding products, many JTA XAResource Managers can be used to assist in a WebSphere Application Server for z/OS peer restart and recovery. Consult your JTA XAResource Manager's documentation to determine if it supports restarting on an alternate system.
Note: When setting up the ARM policy for a sysplex, make sure that both systems have the same level of the Application Server installed. For example, you can not use a V5.0 Application Server to perform peer restart and recovery for a V5.1 Application Server.
Prior to using peer restart and recovery:
Even though it is possible to perform peer restart and recovery across different service levels, minor differences in configuration data can prevent an application server from starting if the service level of the peer system is different from the service level of the failed system. When the post installer detects a service level difference, it sends a message to the operator asking if recovery should continue. You can set up your automation to provide a positive response to this message and allow recovery to continue under these conditions. However, if startup is attempted and fails, the transactions needing to be recovered are now associated with the peer system, and RRS must be stopped on that system before these transactions can be moved back to the failed system. Therefore it is recommended that you modify your ARM policy to turn off peer restart and recovery while you are adding WebSphere Application Server service on your sysplex to ensure that the systems performing peer restart and recovery are at the same service level as the failed system. (See Using RRS panels to resolve indoubt units of recovery for more information about peer restart and recovery messages.)
Note: Clients will see a performance impact if the systems are running at capacity. In an attempt to minimize the memory and CPU impact on the alternate system, the EJB and web containers are not restarted for servers running in peer-restart mode. This means that application servers that are in the state of being recovered will not be able to accept any inbound work.
Why and when to perform this task
Once you have the prerequisites installed, starting a server on a system to which it was not configured will implicitly place it into peer restart and recovery mode. If you configured your XA Partner log to write to a non-shared HFS, or if you are using a JTA XA Resource Manger, you need to perform the following steps before starting a server:Steps for this task
Applicability of the following list: [Version 5.0.2]
Use the administrative console to set the WebSphere variable, TRANLOG_ROOT, to the directory of a shared HFS, to which all systems in the WebSphere Application Server cell can write.
In the administrative console, click Environment > Manage WebSphere Variables. Then click on the TRANLOG_ROOT variable to bring up an new window in which you can specify the directory of the shared HFS.
If the connector used to access a JTA XAResource Manager is not stored in an HFS that is readable by all systems that might be used for recovery, when an application server restarts on an alternate system, it will either appear that there is no XA recovery work to do, or it will be impossible to load the classes necessary to communicate with the JTA XAResource Manager
During a recovery, there will be instances when manual intervention is required to resolve InDoubt units. You will need to use RRS panels for this manual intervention.