APAR status
Closed as program error.
Error description
When an app server goes out of memory and becomes
unresponsive, it could take node agent a long time to detect
it. Meanwhile, node agent and dmgr are slowed down by
unresponsive app server.
Local fix Problem summary
****************************************************************
* USERS AFFECTED: WebSphere Application Server Version 5.x *
* users wh *
* ohave unresponsive application *
* servers in a network deployment *
* environment. *
****************************************************************
* PROBLEM DESCRIPTION: When there is an unresponsive *
* *
* application server in a network *
* deploym *
* ent environment, it is possible *
* for the node agent to *
* take a long time *
* to detect it is unresponsive and *
* perform ap *
* propriate recovery actions. *
* *
* *
* Meanwhile, new JMX requests sent to *
* the unrespons *
* ive application server *
* will hang. As a result, th *
* e ad *
* ministrative console will appear to *
* be hung, and so *
* will the JMX commands *
* run from wsadmin. *
* *
****************************************************************
* RECOMMENDATION: *
* *
****************************************************************
Node agent periodically pings its application servers via JMX
to check their health. When an application server is
unresponsive, it may not be able to reply to the node agent
ping. Node agent will experience a hang until the JMX call
times out. Because of this, it could take the node agent a
long time to detect unresponsive application servers. The
time it takes also depends on the JMX request timeout setting
for the node agent.
Node agent needs to be able to detect unresponsive application
servers within a certain amount of time and take appropriate
recovery actions. This will also address the negative effects
an unresponsive application server has in a network deployment
environment, such as hung administrative console and hung
wsadmin.
Problem conclusion
Updated the node agent so it will detect an unresponsive
applicaton server within ping timeout rather than indefinitely.
The Ping timeout property for an application server can be set
from the the administrative console:
Servers --> Application Servers --> (my server) --> Process
Definition --> Monitoring Policy, update Ping timeout
property. Default is 300 seconds. Note that the node agent
needs to be restarted for the change to take effect.
Depending upon application server usage, it may be necessary to
increase this value. For instance, if the application server
is normally under heavy load and not able to respond to node
agent ping before timeout, you may want to increase this value
to allow more time for the application server to respond.
If the automatic restart property (also under Monitoring Policy
page) is set to true, node agent will attempt to stop the
application server and restart it. It is possible, however,
that the application server may not be able to respond to the
stop request from the node agent. When this happens, user
intervention is needed to recover the unresponsive application
server.
The fix for this APAR is currently targetted for inclusion in
fixpack 5.0.2.12 and 5.1.1.6. Please refer to the Recommended
Updates page for delivery dates:
http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
Temporary fix Comments
APAR information |
APAR number |
PK02205 |
Reported component name |
WAS NETWRK DEPL |
Reported component ID |
5630A3601 |
Reported release |
10S |
Status |
CLOSED PER |
PE |
NoPE |
HIPER |
NoHIPER |
Special Attention |
NoSpecatt |
Submitted date |
2005-03-08 |
Closed date |
2005-05-31 |
Last modified date |
2005-10-13 |
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Modules/Macros
Publications Referenced
|
Fix information |
Fixed component name |
WAS NETWRK DEPL |
Fixed component ID |
5630A3601 |
Applicable component levels |
R003 PSY |
UP |
R00A PSY |
UP |
R00H PSY |
UP |
R00I PSY |
UP |
R00P PSY |
UP |
R00S PSY |
UP |
R00W PSY |
UP |
R103 PSY |
UP |
R10A PSY |
UP |
R10H PSY |
UP |
R10I PSY |
UP |
R10P PSY |
UP |
R10S PSY |
UP |
R10W PSY |
UP |
|