With the health monitoring and management subsystem, you
can take a policy-driven approach to monitoring the application server
environment and take action when certain criteria are discovered.
Health monitoring and management subsystem
The
health management subsystem continuously monitors the state of servers
and the work that is performed by the servers in your environment.
The health management subsystem consists of two main elements: the health
controller and health policies.
The health
controller is the autonomic manager that controls the health monitoring
and management subsystem, and acts on your health policies to ensure
certain conditions exist. The health controller is a distributed resource
that is managed by the high availability manager and exists within
all of the node agent and deployment manager processes. The health
controller is active in one of these processes. If the active process
fails, the health controller can become active on another node agent
or deployment manager process.
The health controller runs on
a control cycle. The control cycle length defines the
amount of time that passes between the checks that the health controller
runs on the environment. After the control cycle ends, the health
controller checks the environment and generates runtime tasks to resolve
any breaches in the health conditions.
You define the health
policies, which include thehealth conditions that you
want to monitor in your environment and the health actions to
take if these conditions are not met.
You can disable or enable
health management using the health controller, while still having
multiple health policies defined on the system. You can also apply
limits to the frequency that the server restarts or prohibit restarts
during certain periods.
The health management subsystem functions
when WebSphere® Extended Deployment is in automatic
or supervised operating mode. When the reaction mode on the policy
is set to automatic, the health management system takes action when
a health policy violation is detected. In supervised mode, the health
management system creates a runtime task that proposes one or more
reactions. The system administrator can approve or deny the proposed
actions.
Health conditions
Health conditions define
the variables that you want to monitor in your environment. Several
categories of health policy conditions exist. You can choose from
the following predefined health conditions:
- Age-based condition
- Specifies the age of the server at which the servers are restarted
to clean out cached and memory-acquired data.
- Excessive response time condition
- Tracks the amount of time that requests take to complete. If the
time exceeds the defined response time threshold, the health actions
run.
Attention: Requests that exceed the timeout value
that is configured for the excessive request timeout condition are
not counted towards this health condition. For example, if the default
timeout value of 60 seconds is being used for the excessive request
timeout condition, any request that exceeds 60 seconds does not activate
the health actions that you have defined for the excessive response
time condition. This restriction applies even if you do not have the
excessive request timeout condition defined in your environment.
- Excessive request timeout condition
- Specifies a percentage of HTTP requests that can time out. When
the percentage exceeds the defined value, the health actions run.
- Memory condition: excessive memory usage
- Tracks the memory usage for a member. When the memory usage exceeds
a percentage of the heap size for a specified time, health actions
run to correct this situation.
- Memory condition: memory leak
- Tracks consistent downward trends in free memory that is available
to a server in the Java heap. When the Java heap approaches its maximum
configured size, you can perform either heap dumps or server restarts.
- Storm drain detection
- Tracks requests that have a significantly increased response time.
This policy relies on change point detection on given time series
data.
- Workload condition
- Specifies a number of requests that are serviced before policy
members restart to clean out memory and cache data.
For more information about these conditions, click
the help icon on the
Health policy settings panel
in the administrative console.
With these predefined health
policy conditions, actions have been taken to optimize the distribution
of the needed data and minimize the impact of monitoring and enforcing
the health policy on your overall environment.
Health actions
Health actions define the
process to use when a health condition is not met. Depending on the
conditions that you define, the actions can vary. The following table
lists the health actions that are supported in various server environments:
The following list defines the health actions
that can be run in your environment:
- Restart server
- Take thread dumps (Supported for servers that are running on the
IBM Software Development Kit)
- Take Java virtual machine (JVM) heap dumps (Supported for servers
that are running on the IBM Software Development Kit)
Health policy targets
Health policy targets
can be a single server, each of the servers in a cluster or dynamic
cluster, or
each of the servers in a cell. You can define multiple health policies
to monitor the same set of servers.
Default health policies
![[Version 6.0.1 and later]](../images/v601x.gif)
Default health policies are a set of predefined,
supervised mode, cell-level policies that are installed with
WebSphere Extended Deployment. You can modify the
default policies for your environment, or delete the default health
policies. Because the default health policies monitor each server
in supervised mode, you can use these policies to prevent health problems.
You can define policies with more detailed settings or automated mode
operation for particular servers or collections of servers in addition
to the default policies. The following list shows the five default
cell-wide health policies that are created during installation:
- Default memory leak: Default standard detection level.
The default memory leak health policy uses the performance advisor
functionality, so the performance advisor is enabled when this policy
is enabled. To disable the performance advisor, you must remove this
health policy or narrow the membership of the health policy. To preserve
the health policy for future use, consider keeping the default memory
leak policy, but removing all of the members. To change the members,
click Operational policies > Health policies > Default_Memory_Leak.
You can edit the health policy memberships by adding and removing
specific members from the policy.
- Default excessive memory usage: Set to 95 percent of the
JVM heap size for 15 minutes
- Default excessive request timeout: Set for 5 percent of
the requests timing out
- Default excessive response time: Set to 120 seconds
- Default storm drain: Default standard detection level
To view the recommendations that are made by the default
health policies and to take actions on these recommendations, click System
administration > Task management > Runtime tasks.