WebSphere Extended Deployment, Version 6.0.x
             Operating Systems: AIX, HP-UX, Linux, Solaris, Windows, z/OS


Health management and long-running work

The behavior of health management is different between transactional, or Web work and business grid application work, which is typically long-running.

Several situations exist where the typical health management behavior is different for long-running applications. For example, because a long-running application can run for hours or days, restarting servers is delayed until the pending long-running job on the dynamic cluster member completes. Certain health policy conditions that use data from the on demand router (ODR) or Web container do not apply because the long-running scheduler submits jobs directly to the execution environment.

When the health management controller is managing transactional servers, the controller does not restart the server when only one server is active in the dynamic cluster. However, this rule does not apply for business grid servers. The health management controller must always restart the server when the business grid can restart, regardless of the number of active server instances. The health management controller contacts the long-running scheduler about restarts. The long-running scheduler decides if the server should be restarted.

The long-running scheduler typically runs on a transactional server, so any existing health policy actions apply.

The following list describes the processing of existing health policies for long-running execution environment.

Table 1. Processing of existing health policies in the long-running execution environment
Condition type Behavior

Age-based condition

If no jobs are running on the business gridserver, then the server is restarted on the same node. If jobs are running, the health management contacts the long-running scheduler about the condition, but does not restart the server. The long-running scheduler quiesces the server by stopping to dispatch jobs to that server. The health management controller continues to indicate the age-based condition each time that the controller cycles. After the last job on the server completes, the long-running scheduler can restart on the same node the next time that the age-based condition is called by the health management controller.
Memory condition: Excessive memory usage and Memory condition: Memory leak Because memory conditions are considered more severe than age conditions, the long-running scheduler can restart on the same node. Any batch jobs that are interrupted because of the restart are automatically restarted when the new server starts. Any active compute-intensive jobs fail.
Excessive request timeout condition, Excessive response time condition, Storm drain condition, and Workload condition The sensors that are involved in these conditions are not engaged during business grid work processing. You can configure these policies to apply to business grid servers, for example, on the cell level. The policies are not active for a business grid server unless you are using a mixed Web and business grid configuration, which means that both types of applications are deployed on a single dynamic cluster. If you have a mixed configuration, then the health management controller displays warning messages if it encounters these conditions on long-running execution environment servers. Consider deploying your Web and business gridapplications to different dynamic clusters.



Related concepts
Health management
The long-running execution environment
Related tasks
Creating health policies
Configuring the health controller
Concept topic    

Terms of Use | Feedback

Last updated: Oct 16, 2009 11:06:12 AM EDT
http://publib.boulder.ibm.com/infocenter/wxdinfo/v6r0/index.jsp?topic=/com.ibm.websphere.xd.doc/info/odoe_task/codhealthbgrid.html