WebSphere Enterprise Service Bus, Version 6.2.0 Operating Systems: AIX, HP-UX, i5/OS, Linux, Solaris, Windows


Recovery from infrastructure failures

A long-running process spans multiple transactions. If a transaction fails because of an infrastructure failure, Business Flow Manager provides a facility for automatically recovering from these failures.

In a long-running process, the Business Flow Manager sends itself request messages that trigger follow-on navigation. For each incoming request message, a new transaction is started and the request message is passed to the Business Flow Manager for processing. Each transaction consists of the following actions:

Business Flow Manager uses the following queues for coping with infrastructure failures: When messages are processed successfully, it is inferred that the infrastructure is available. However, Business Flow Manager might fail to process a message in the following situations:
Cause Response
Unavailable infrastructure In normal processing mode, for a specified time, all messages are kept available until the infrastructure is operational again. This problem might be caused by a database failure, for example.
Damaged message After a specified number of retries, the message is put into the hold queue. From the hold queue, it can also be moved back to the input queue, to retry the transaction.

If the infrastructure is unavailable, and the retention queue is full, message processing is switched from normal processing to quiesce mode. In quiesce mode, the message processing is slowed down until the infrastructure is available again. When the infrastructure becomes available, message processing switches back to normal mode.

Normal message processing

During normal processing, a message is processed as follows:

Message processing in quiesce mode

In quiesce mode, processing a message is attempted periodically. Messages that fail to be processed are put back in the input queue, without incrementing either the delivery count or the retention queue traversal count. As soon as a message can be processed successfully, message processing is switched back to normal mode.

Retry limit

The retry limit defines the maximum number of times that a message can be transferred to the retention queue before it is put in the hold queue.

To be put in the retention queue, the processing of a message must fail three times.

For example, if the retry limit is 5, a message must go to the retention queue five times (it must fail for 3 * 5 = 15 times), before the last retry is started. If the last retry fails two more times, the message is put in the hold queue. This means that a message must fail (3 * RetryLimit) + 2 times before it is put in the hold queue.

In a performance-critical application running in a reliable infrastructure, the retry limit should be small: one or two, for example. If the retry limit is set to zero, a repeatedly failing message is retried three times and then it goes immediately into the hold queue.

This Business Flow Manager property is specified in the administrative console. Click Servers > Application servers > server_name, or Servers > Clusters > cluster_name if Business Process Choreographer is configured on a cluster. On the Configuration tab, under Business Integration, click Business Process Choreographer > Business Flow Manager.

Retention queue message limit

The retention queue message limit defines the maximum number of messages that can be in the retention queue. If the retention queue overflows, the system goes into quiesce mode. To make the system enter quiesce mode as soon as a message fails, set the value to zero. To make Business Flow Manager more tolerant of infrastructure failures, increase the value.

This property is specified in the administrative console. Click Servers > Application servers > server_name, or Servers > Clusters > cluster_name if Business Process Choreographer is configured on a cluster. On the Configuration tab, under Business Integration, click Business Process Choreographer > Business Flow Manager.

Replay Messages

The administrator can move the messages from the hold or retention queues back to the internal queue. This can be done using the administrative console, administrative scripts, or failed event manager.


concept Concept topic

Terms of use | Feedback


Timestamp icon Last updated: 21 June 2010


http://publib.boulder.ibm.com/infocenter/dmndhelp/v6r2mx/topic//com.ibm.websphere.wesb620.doc/doc/bpc/c5replay.html
Copyright IBM Corporation 2005, 2010. All Rights Reserved.
This information center is powered by Eclipse technology (http://www.eclipse.org).