WebSphere Enterprise Service Bus for z/OS, Version 6.2.0 Operating Systems: z/OS


Transactional properties and solution recovery

WebSphere® ESB is based on WebSphere Application Server and as such, supports a transactional model conducting business transactions.

WebSphere ESB builds on this transactional model, providing for loosely-coupled SOA applications and BPM applications.

Technically, this means two things:

  1. WebSphere ESB relies on databases and messaging systems to achieve transactional application execution patterns.
  2. Transactions are incumbent in messaging systems and database systems.

    Transactions are compliant with ACID properties. Transactions are considered to be ACID-compliant when they include atomicity, consistency, isolation, and durability.

    WebSphere ESB uses databases and messaging systems to achieve a "loosely-coupled" pattern. WebSphere ESB updates a database and sends a message. Both the update to the database and the message are committed in the same transaction.

    Another characteristic of a "loosely-coupled" pattern is to pull a message from a messaging system and update databases. If there is a failure during this processing, the event goes back to the message queue as though it had not been read. WebSphere ESB has a retry mechanism, in which after 5 tries, the event goes to the Failed event manager. The phrase "loosely-coupled" refers to the fact that all work does not have to happen in one big transaction.

Avoiding lost data in the event of system failures

With proper tuning and configuration of the available resource managers, no data is lost if there is a failure of a given part of the system. Transactional integrity, including rollback and recovery mechanisms, are the key components in WebSphere that ensure data is not lost if failures occur.

In order for WebSphere rollback and recovery mechanisms to work, you need to set up the resource managers (database and messaging) properly. For example, lock time-outs in databases must be set properly, so that when a server recovers, it can complete either a commit or a rollback without encountering lock conditions.

WebSphere ESB adds additional capabilities that augment those of WebSphere Application Server, to provide a complete solution for recovering data from unexpected failures.

High-level description of enabling recovery features

The core recovery model for WebSphere ESB is based on units of work. The system can handle and recover from failures that occur during system operations centered on a single unit of work being accomplished, providing uninterrupted service. This type of recovery occurs through a series of retry mechanisms and error queues. Part of your application design should include the capability to differentiate system errors from application errors. System errors are passed back to the infrastructure supporting the calling component, where additional system level recovery can be attempted or a transformation into a more generic business exception can occur. You can configure various retry mechanisms to run automatically. Additionally, WebSphere ESB provides a set of consoles and corresponding programming interfaces that enable more human intervention where appropriate. Many of these capabilities and the failures that they deal with can be leveraged while the server that contains the work continues processing new requests.

Unavailable server - High-level description

If a failure causes one or more servers in a highly-available WebSphere cluster to become unavailable, additional recovery capabilities within the system are called upon as follows:
  1. Inbound work is routed away from the failing system

    This is done using underlying WebSphere Application Server workload management facilities, which can vary based on protocol, topology and configuration.

  2. Administrator initiates actions

    While the system as a whole remains active and available, the administrator can perform recovery operations.

    Administrator actions are aimed at performing basic triage and then restarting the failing server. This restart replays transactions logs and should clean up most server down situations.

    The use of the error handling mechanisms provided by WebSphere ESB is sometimes required to administer a complete recovery.

Unavailable cluster - High-level description

If an entire server cluster becomes unavailable or unresponsive, then a more involved set of recovery actions are necessary. For example, if a shared resource such as a database becomes unavailable, then all servers in a cluster have the same difficulties completing the work.

Procedures that deal with shared resource recovery depend on which shared resource suffered the failure. You can apply various WebSphere techniques to minimize overall downtime and restart stalled work.

Catastrophic failure - High-level description

In catastrophic situations, entire machines can become unavailable or servers deemed not recoverable. In such cases, you can rely on the advanced features in WebSphere for recovery of a server's failures to be run on another server in the same cluster. Through the use of this feature and the prerequisite of having network-attached storage or some other mechanism to share logs, this kind of recovery is also possible. For more information about recovery of a failed server by another member of the same cluster, see Peer recovery.


concept Concept topic

Terms of use | Feedback


Timestamp icon Last updated: 21 June 2010


http://publib.boulder.ibm.com/infocenter/dmndhelp/v6r2mx/topic//com.ibm.websphere.wesb620.zseries.doc/doc/crec_trnsactional.html
Copyright IBM Corporation 2005, 2010. All Rights Reserved.
This information center is powered by Eclipse technology (http://www.eclipse.org).