Failed message handling and quiesce mode

Business Process Choreographer provides a facility for handling temporary infrastructure failures.

This section describes how the business process container handles failed messages. This contrasts with the simpler mechanism used by the human task container, described in Failed message handling for human tasks.

Long-running processes consist of a sequence of transactions. The transactions are separated by Java Message Service (JMS) messages, which the server sends to a message-driven bean. This bean passes the incoming messages to the process server, for processing. Each transaction consists of the following actions:

The server might fail to process a message received by the message-driven bean for either of the following reasons:

The responses to these causes are as follows:

Cause Response
Unavailable infrastructure The message-driven bean tries, for a specified time, to recover from that situation. It tries to keep all messages available until the server is again operational. This problem might be caused by a database failure, for example.
Damaged message After a specified number of retries, the message is put into the hold queue, where it can be manipulated or reviewed. From the hold queue, it can also be moved back to the input queue, to retry the transaction.

The implementation for messages for business processes is as follows:

When the message-driven bean operates in quiesce mode, it periodically tries to process a message. Messages that fail to be processed are put back in the input queue, without incrementing either the delivery count or the retention queue traversal count. As soon as a message can be processed successfully, the message-driven bean switches back into normal processing mode.

This facility consists of two numerical limits, two queues, quiesce mode, and the message retry behavior.

Retry limit

The retry limit defines the maximum number of times that a message can be transferred through the retention queue before being put on the hold queue.

To be put on the retention queue, the processing of a message must fail three times.

For example, if the retry limit is 5, a message must go through the retention queue five times (it must fail for 3 * 5 = 15 times), before the last retry loop is started. If the last retry loop fails two more times, the message is put on the hold queue. This means that a message must fail (3 * RetryLimit) + 2 times before it is put on the hold queue.

In a performance-critical application running in a reliable infrastructure, the retry limit should be small: one or two, for example. This parameter can be found in the administrative console, on the Business Process Container configuration page.

Retention queue message limit

The retention queue message limit defines the maximum number of messages that can be on the retention queue. If the retention queue overflows, the system goes into quiesce mode. To make the system enter quiesce mode as soon as one message fails, set the value to zero. To make the business process container more tolerant of infrastructure failures, increase the value.

This parameter can be found in the administrative console, on the Business Process Container page. (To locate this parameter, click Servers > Application Servers > server_name. Then, under the heading Business Process Container Settings, click Business Process Container. The Retry Limit field is under the heading General Properties.)

Retention queue

The retention queue holds failed messages that are replayed by moving them back to the business process container's internal work queue. A message is put on the retention queue if it fails three times. If the message fails (3 * RetryLimit) + 2 times, it is put on the hold queue. (For details of the retry limit, see Retry limit.) If the retention queue is full to the limit defined by the retention queue message limit and another message fails, the queue overflows, and the system goes into quiesce mode. The administrator can move the messages in this queue back to the internal queue performing the task Querying and replaying failed messages.

Hold queue

The hold queue contains messages that have failed (3 * RetryLimit) + 2 times. (For details of the retry limit, see Retry limit.) The administrator can move the messages in this queue back to the internal queue performing the task Querying and replaying failed messages.

Replay Messages

The administrator can move the messages from the hold or retention queues back to the internal queue. This can be done using the administrative console or using administrative commands.

Quiesce Mode

Quiesce mode is entered when the retention queue overflows. When this happens, it is assumed that there is a serious, though possibly temporary, infrastructure failure. The purpose of quiesce mode is to prevent the system from using a lot of resources, while an infrastructure failure means that most messages will probably fail anyway. In quiesce mode, the system sleeps for two seconds before attempting to process the next message. As soon as a message is successfully processed, the system resumes normal message processing.

Failed message handling for human tasks

The human task container does not have a retention queue, nor retry limits. It only has a hold queue, on which failed messages are placed, and from which, they can be replayed.


Terms of use |

Last updated: Tue Feb 21 17:19:17 2006

(c) Copyright IBM Corporation 2005.
This information center is powered by Eclipse technology (http://www.eclipse.org)