In service integration, there can be exception conditions that do not require a messaging engine to restart, exception conditions that require an automatic restart of the messaging engine, exception conditions that are detected by explicit health monitoring and handled by the HAManager, and exception conditions that require user intervention.
A messaging engine can recover from local exceptions by an automatic restart of the messaging engine, either on its current server or on an alternative server. For example, if a messaging engine cannot connect to its data store, possibly the server in which the messaging engine runs cannot create a connection to the data store, but another server in the same cluster can. In a high availability configuration, that is, failover is enabled, the HAManager will stop and disable the messaging engine in the current server and fail over the messaging engine to a new server. The disabled messaging engine is automatically enabled after 30 seconds.
A messaging engine cannot detect exceptions such as a thread spinning (when the thread becomes trapped in a loop and no longer performs useful work), or a deadlock (when two threads are blocking each other), but explicit health monitoring can. The HAManager provides such monitoring, and periodically tests the health of the messaging engine. If the HAManager detects that a messaging engine that uses the data store cannot run properly, the HAManager stops and disables the messaging engine. If the messaging engine uses a file store, then the HAManager shuts down the server that is hosting the messaging engine. If the server is in a cluster, the HAManager restarts the messaging engine on an alternative server, if the policy of the messaging engine allows failover. The disabled messaging engine is automatically enabled after 30 seconds, if the messaging engine uses a data store.
A messaging engine cannot recover from global exceptions by restarting or failing over the messaging engine. For example, if the data store for a messaging engine becomes corrupted, the problem is not resolved by running the messaging engine on a different server because it encounters the same problem. If a messaging engine in this situation was to be failed over, the messaging engine would be continually failed over because it could not run in any server. There would be unwanted disruption to the cluster as servers attempted to run the messaging engine and were shut down. To avoid such a situation, if a global exception occurs, the messaging engine logs an error, stops processing messages, and is not failed over. The messaging engine cannot be restarted until you correct the global exception condition and restart the server.