Setting up peer restart and recovery

Before you begin

To allow WebSphere Application Server for z/OS to restart on an alternate system, the following prerequisites must be installed on every system (your original system as well as any systems intended for recovery) before reconfiguring the ARM policies to enable peer restart and recovery. You must also make sure all of the systems, where you might need to perform restart, are part of the same RRS log group.

Installing the prerequisite service updates on all of these systems will not hinder your current running environment if you want to continue to only restart in place. However, if this service is not installed, there is a possibility that the controller will not be able to move back. OTS will attempt to restart on the alternate system and fail. If there are any URs that are unresolved with RRS once this happens, the controller will not be allowed to restart on the home system until RRS is cancelled on the alternate system. For more information on OTS and RRS, see z/OS MVS Programming: Resource Recovery.

If you do not plan to use peer restart and recovery, you do not need to abide by these functional prerequisites. Your system will instead use the restart-in-place function.

The following products all support RRS. Individually, they also support peer restart and recovery, providing the above prerequisites are all properly installed:

In addition to the preceding products, many JTA XAResource Managers can be used to assist in a WebSphere Application Server for z/OS peer restart and recovery. Consult your JTA XAResource Manager's documentation to determine if it supports restarting on an alternate system.

Note: When setting up the ARM policy for a sysplex, make sure that both systems have the same level of the Application Server installed. For example, you can not use a V5.0 Application Server to perform peer restart and recovery for a V5.1 Application Server.

Prior to using peer restart and recovery:

Note: Clients will see a performance impact if the systems are running at capacity. In an attempt to minimize the memory and CPU impact on the alternate system, the EJB and web containers are not restarted for servers running in peer-restart mode. This means that application servers that are in the state of being recovered will not be able to accept any inbound work.

Why and when to perform this task

Once you have the prerequisites installed, starting a server on a system to which it was not configured will implicitly place it into peer restart and recovery mode. If you configured your XA Partner log to write to a non-shared HFS, or if you are using a JTA XA Resource Manger, you need to perform the following steps before starting a server:

Steps for this task

  1. (Required only if you are using a non-shared HFS.) Enable non-shared HFS support.
    When using a non-shared HFS, the configuration settings must be replicated across the different systems in the sysplex. This is done automatically by the deployment manager and node agent. To enable this support, each node agent in your configuration must be set as a recovery node. This change is made in the administrative console:
    1. In the administrative console navigation, select System Administration > Node Agents.
    2. Select a node agent from the list.
    3. Under Additional Properties, select File Synchronization Service.
    4. Under Additional Properties, select Custom Properties.
    5. Select New.
    6. Enter recoveryNode for Name, and true for Value. The Description field can be left blank.
    7. Repeat steps 3-7 for each node agent in your configuration.
    8. Save your configuration.
  2. (Required only if you are using JTA XAResource Managers.) Make sure appropriate logs and classes are available on the alternate system
    If you plan to use WebSphere Application Server peer restart and recovery, and your applications access JTA XAResource Managers, you must ensure that the appropriate log files and classes are available on the alternate system. If the log files are not available:

    Applicability of the following list: [Version 5.0.2]

    1. Point the WebSphere variable TRANLOG_ROOT to a shared HFS.
      The WebSphere variable TRANLOG_ROOT must point to a shared HFS, to which all systems in the WebSphere Application Server cell can write. The XA partner log is stored here, and the alternate system must be able to read and update this log.

      Use the administrative console to set the WebSphere variable, TRANLOG_ROOT, to the directory of a shared HFS, to which all systems in the WebSphere Application Server cell can write.

      In the administrative console, click Environment > Manage WebSphere Variables. Then click on the TRANLOG_ROOT variable to bring up an new window in which you can specify the directory of the shared HFS.

    2. Store the driver (i.e., JDBC Driver, JMS Provider, or JCA Resource Adapter, etc.) for each JTA XAResource Manager in an HFS that is readable by all systems in the WebSphere Application Server cell.
      For example, if your connector is a JDBC driver for a database, the driver would likely be stored in a read-only HFS that is accessible by all systems in the sysplex. This allows the alternate system to read the saved classpath for the resource, and reconstruct it during a restart.

      If the connector used to access a JTA XAResource Manager is not stored in an HFS that is readable by all systems that might be used for recovery, when an application server restarts on an alternate system, it will either appear that there is no XA recovery work to do, or it will be impossible to load the classes necessary to communicate with the JTA XAResource Manager

  3. Resolve InDoubt units.

    During a recovery, there will be instances when manual intervention is required to resolve InDoubt units. You will need to use RRS panels for this manual intervention.


Related concepts
Peer restart and recovery
Recoverable communication manager
Related tasks
Configuring application servers



Searchable topic ID:   tprruse
Last updated: Jun 21, 2007 9:56:50 PM CDT    WebSphere Application Server for z/OS, Version 5.0.2
http://publib.boulder.ibm.com/infocenter/wasinfo/index.jsp?topic=/com.ibm.websphere.zseries.doc/info/zseries/ae/tprr_use.html

Library | Support | Terms of Use | Feedback