WebSphere Virtual Enterprise, Version 6.1.1
             Operating Systems: AIX, HP-UX, Linux, Solaris, Windows, z/OS


Health management

With the health monitoring and management subsystem, you can take a policy-driven approach to monitoring the application server environment and take action when certain criteria are discovered.

Health monitoring and management subsystem

The health management subsystem continuously monitors the state of servers and the work that is performed by the servers in your environment. The health management subsystem consists of two main elements: the health controller and health policies.

The health controller is the autonomic manager that controls the health monitoring and management subsystem, and acts on your health policies to ensure certain conditions exist. The health controller is a distributed resource that is managed by the high availability manager and exists within all of the node agent and deployment manager processes. The health controller is active in one of these processes. If the active process fails, the health controller can become active on another node agent or deployment manager process.

The health controller runs on a control cycle. The control cycle length defines the amount of time that passes between the checks that the health controller runs on the environment. After the control cycle ends, the health controller checks the environment and generates runtime tasks to resolve any breaches in the health conditions.

You define the health policies, which include thehealth conditions that you want to monitor in your environment and the health actions to take if these conditions are not met.

You can disable or enable health management using the health controller, while still having multiple health policies defined on the system. You can also apply limits to the frequency that the server restarts or prohibit restarts during certain periods.

The health management subsystem functions when WebSphere® Virtual Enterprise is in automatic or supervised operating mode. When the reaction mode on the policy is set to automatic, the health management system takes action when a health policy violation is detected. In supervised mode, the health management system creates a runtime task that proposes one or more reactions. The system administrator can approve or deny the proposed actions.

Health conditions

Health conditions define the variables that you want to monitor in your environment. Several categories of health policy conditions exist. You can choose from the following predefined health conditions:
Age-based condition
Specifies the age of the server at which the servers are restarted to clean out cached and memory-acquired data.
Excessive response time condition
Tracks the amount of time that requests take to complete. If the time exceeds the defined response time threshold, the health actions run.
Attention: Requests that exceed the timeout value that is configured for the excessive request timeout condition are not counted towards this health condition. For example, if the default timeout value of 60 seconds is being used for the excessive request timeout condition, any request that exceeds 60 seconds does not activate the health actions that you have defined for the excessive response time condition. This restriction applies even if you do not have the excessive request timeout condition defined in your environment.
Excessive request timeout condition
Specifies a percentage of HTTP requests that can time out. When the percentage exceeds the defined value, the health actions run.The timeout value that is used depends on how your environment is configured. For more information about the excessive request timeout health condition, see Excessive request timeout health policy target timeout value .
Memory condition: excessive memory usage
Tracks the memory usage for a member. When the memory usage exceeds a percentage of the heap size for a specified time, health actions run to correct this situation.
Memory condition: memory leak
Tracks consistent downward trends in free memory that is available to a server in the Java heap. When the Java heap approaches its maximum configured size, you can perform either heap dumps or server restarts.
Storm drain detection
Tracks requests that have a significantly increased response time. This policy relies on change point detection on given time series data.
Workload condition
Specifies a number of requests that are serviced before policy members restart to clean out memory and cache data.
For more information about these conditions, click the help icon on the Health policy settings panel in the administrative console.

With these predefined health policy conditions, actions have been taken to optimize the distribution of the needed data and minimize the impact of monitoring and enforcing the health policy on your overall environment.

You can also define custom conditions within your health policy. Use a custom condition when the predefined health conditions do not fit your needs. You define custom conditions as a subexpression that is tested against metrics in your environment. When you define a custom condition, consider the cost of collecting the data, analyzing the data, and if needed, enforcing the health policy. This cost can increase depending on the amount of traffic and number of servers in your network, so you must analyze the performance of your custom health conditions before you move them into production.

Health actions

Health actions define the process to use when a health condition is not met. Depending on the conditions that you define, the actions can vary. The following table lists the health actions that are supported in various server environments:

Table 1. Predefined health action support for different server types
Health Action WebSphere Application Server or WebSphere Virtual Enterprise servers that run in the same cell as the health controller Other middleware servers, including external WebSphere application servers, that run the middleware agent
Restart server Supported Supported
Take thread dumps Supported for servers that are running on the IBM Software Development Kit Not supported
Take Java virtual machine (JVM) heap dumps Supported for servers that are running on the IBM Software Development Kit Not supported
Put server into maintenance mode Supported Supported
Put server into maintenance mode and break HTTP and SIP request affinity to the server Supported Supported
Take server out of maintenance mode Supported Supported

You can also define a custom action. With a custom action, you define an executable file to run when the health condition breaches. You must define custom actions before you create the health policy that contains the custom actions.

Health policy targets

Health policy targets can be a single server, each of the servers in a cluster or dynamic cluster, the on demand router (ODR), or each of the servers in a cell. You can define multiple health policies to monitor the same set of servers.

If you are using a predefined policy, then the support of the policy varies depending on the server type. Other middleware servers do not support all of the policy types. The following table summarizes the health policy support, by server type:
Table 2. Health policy support for different server types
Predefined Health Policy WebSphere Application Server or WebSphere Virtual Enterprise servers that run in the same cell as the health controller Other middleware servers, including external WebSphere application servers, that run the middleware agent
Age-based policy Supported Supported
Workload policy Supported Supported
Memory leak detection Supported Not supported
Excessive memory usage Supported Supported for WebSphere Application Server Community Edition servers. Not supported for other middleware server types.
Excessive request timeout Supported Supported for other middleware servers to which the ODR routes requests.
Excessive response time Supported Supported
Storm drain detection Supported Supported

Default health policies

Default health policies are a set of predefined, supervised mode, cell-level policies that are installed with WebSphere Virtual Enterprise. You can modify the default policies for your environment, or delete the default health policies. Because the default health policies monitor each server in supervised mode, you can use these policies to prevent health problems. You can define policies with more detailed settings or automated mode operation for particular servers or collections of servers in addition to the default policies. The following list shows the five default cell-wide health policies that are created during installation:
  • Default memory leak: Default standard detection level. The default memory leak health policy uses the performance advisor functionality, so the performance advisor is enabled when this policy is enabled. To disable the performance advisor, you must remove this health policy or narrow the membership of the health policy. To preserve the health policy for future use, consider keeping the default memory leak policy, but removing all of the members. To change the members, click Operational policies > Health policies > Default_Memory_Leak. You can edit the health policy memberships by adding and removing specific members from the policy.
  • Default excessive memory usage: Set to 95 percent of the JVM heap size for 15 minutes
  • Default excessive request timeout: Set for 5 percent of the requests timing out
  • Default excessive response time: Set to 120 seconds
  • Default storm drain: Default standard detection level

To view the recommendations that are made by the default health policies and to take actions on these recommendations, click System administration > Task management > Runtime tasks.




Related concepts
Excessive request timeout health policy target timeout value
Related tasks
Configuring health management
Creating health policies
Setting maintenance mode
Creating health policy custom actions
Managing runtime tasks
Concept topic    

Terms of Use | Feedback

Last updated: Oct 30, 2009 1:33:44 PM EDT
http://publib.boulder.ibm.com/infocenter/wxdinfo/v6r1m1/index.jsp?topic=/com.ibm.websphere.ops.doc/info/odoe_task/codhealth.html