WebSphere WebSphere Application Server Network Deployment, Version 6.1.x Operating Systems: AIX, HP-UX, i5/OS, Linux, Solaris, Windows, z/OS

Service integration error types

There are four types of error that can occur in service integration: errors that a messaging engine can recover from while it is running, errors that can be resolved by an automatic restart of the messaging engine, errors that require a user to intervene, and errors that are not detectable in the messaging engine.

Errors that a messaging engine can recover from while it is running

These recoverable errors can be rectified by the system without restarting or failing over the messaging engine. In this situation, the system automatically corrects the error. The system also adds an entry to the system error log that explains the error and suggests any user actions. The messaging engine continues to run and to honor the quality of service specified for the messages it is processing.

Errors that can be resolved by an automatic restart of the messaging engine (local errors)

A local error can be resolved by restarting the messaging engine, either on its current server or on an alternative server. For example, if a messaging engine cannot connect to its data store, it might be that the server in which it is running cannot create a connection. However, another server in the same cluster might still have access. The HAManager will fail over the messaging engine and shut down the server on which it was running. If the type of deployment that has been configured does not have failover capability, for example, if there is only one server rather than a cluster, the server is shut down and the messaging engine is restarted only after the server is restarted.

Errors that require the user's intervention (global errors)

A global error cannot be fixed by restarting or failing over the messaging engine. For example, if a messaging engine's data store becomes corrupted, the messaging engine cannot run on a different server because it will encounter the same problem. If a messaging engine in this situation were to be failed over, the messaging engine would be continually failed over because it could not run in any server. This would cause unwanted disruption to the cluster as servers attempted to run the messaging engine and were shut down. To avoid such a situation, if a global error is encountered, the messaging engine logs an error, stops processing messages, and is not failed over. The messaging engine cannot be restarted until you have corrected the global error condition and restarted the server.

Error not detectable by the messaging engine

Errors such as a thread spinning (when the thread becomes trapped in a tight loop and no longer performs useful work), or a deadlock (when two threads are blocking each other), may only be detectable by explicit health monitoring. The HAManager provides such monitoring, and periodically tests the health of the messaging engine. If the HAManager detects that the messaging engine is not able to run properly then the HAManager shuts down the server which is hosting the messaging engine. If the server was in a cluster the messaging engine will be restarted on an alternative server, if its policy allows. The shut down server will be restarted by the node agent. If the server was not in a cluster the server must be restarted, then the messaging engine will restart on that server.
Related tasks
Injecting failures into a high availability system

Concept topic

Terms of use | Feedback


Timestamp icon Last updated: 26 February 2009
http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.pmc.nd.multiplatform.doc/concepts/cjt0004_.html

Copyright IBM Corporation 2004, 2009. All Rights Reserved.
This information center is powered by Eclipse technology. (http://www.eclipse.org)