The behavior of health management is different between transactional, or Web work and business grid application work, which tends to be long-running.
Several situations exist where the typical health management behavior is different for long-running applications. For example, because a long-running application can be running for hours or days, restarting servers is delayed until the pending business grid jobs on the dynamic cluster member complete. Certain health policy conditions that use data from the on demand router (ODR) or Web container do not apply because the business grid scheduler submits jobs directly to the execution environment.
When the health management controller is managing transactional servers, the controller does not perform a restart on a server when only one server is active in the dynamic cluster. However, this rule does not apply for business grid servers. The health management controller must always restart the server when the business grid allows a restart, regardless of the number of active server instances. The health management controller contacts the business grid scheduler about restarts. The business grid scheduler decides if the server should be restarted.
The business grid scheduler normally runs on a transactional server, so any existing health policy actions apply.
The following list describes the processing of existing health policies for long running execution environment servers. For more information about these health policies, see Creating the health policy.
Condition type | Behavior |
---|---|
Age-based condition |
If no jobs are running on the business grid server, the restart is performed on the same node. If jobs are running, the health management contacts the business grid scheduler about the condition, but does not restart the server. The business grid scheduler quiesces the server by stopping dispatching jobs to that server. The health management controller continues to indicate the age-based condition each time that it cycles. After the last job on the server completes, the business grid scheduler allows a restart on the same node the next time that the age-based condition is called by the health management controller. |
Memory condition: Excessive memory usage Memory condition: Memory leak |
Because memory conditions are considered more severe than age conditions, the business grid scheduler allows a restart on the same node. Any batch jobs that are interrupted because of the restart are automatically restarted when the new server starts. Any active compute-intensive jobs fail. |
Excessive request timeout condition Excessive response time condition Storm drain condition Workload condition |
The sensors that are involved in these conditions are not engaged during business grid work processing. You can configure these policies to apply to business grid servers, for example, on the cell level. The policies are not active for a business grid server unless you are using a mixed Web and business grid configuration, which means that both types of applications are deployed on a single dynamic cluster. If you have a mixed configuration, the health management controller displays warning messages if it encounters these conditions on long-running execution environment servers. Consider deploying your Web and business grid applications to different dynamic clusters. |