PK16645: NODEAGENT NOT RE-STARTING AN UNRESPONSIVE APPLICATION SERVER AFTER IT STOPS IT EVEN THOUGH AUTO-RESTART IS ENABLED

 Fixes are available

5.0.2.17: WebSphere Application Server 5.0.2 Cumulative Fix 17 for Solaris
5.0.2.17: WebSphere Application Server 5.0.2 Cumulative Fix 17 for Windows
5.1.1.10: WebSphere Application Server V5.1.1 Cumulative Fix 10 for Windows
5.1.1.10: WebSphere Application Server V5.1.1 Cumulative Fix 10 for AIX
5.0.2.17: WebSphere Application Server 5.0.2 Cumulative Fix 17 for HP-UX
5.0.2.17: WebSphere Application Server 5.0.2 Cumulative Fix 17 for AIX
5.0.2.17: WebSphere Application Server 5.0.2 Cumulative Fix 17 for Linux
5.1.1.10: WebSphere Application Server V5.1.1 Cumulative Fix 10 for HP-UX
5.1.1.10: WebSphere Application Server V5.1.1 Cumulative Fix 10 for Linux
5.0.2.16: WebSphere Application Server 5.0.2 Cumulative Fix 16 for HP-UX
5.0.2.16: WebSphere Application Server 5.0.2 Cumulative Fix 16 for Solaris
5.0.2.16: WebSphere Application Server 5.0.2 Cumulative Fix 16 for Windows
5.0.2.16: WebSphere Application Server 5.0.2 Cumulative Fix 16 for AIX
5.0.2.18: WebSphere Application Server 5.0.2 Cumulative Fix 18 for Solaris
5.0.2.18: WebSphere Application Server 5.0.2 Cumulative Fix 18 for Windows
5.0.2.18: WebSphere Application Server 5.0.2 Cumulative Fix 18 for HP-UX
5.0.2.18: WebSphere Application Server 5.0.2 Cumulative Fix 18 for AIX
5.0.2.18: WebSphere Application Server 5.0.2 Cumulative Fix 18 for Linux
5.0.2.16: WebSphere Application Server 5.0.2 Cumulative Fix 16 for Linux
5.1.1.10: WebSphere Application Server V5.1.1 Cumulative Fix 10 for Solaris



APAR status
Closed as program error.

Error description
After the nodeagent stops a non-responsive appserver it does
not always re-start it, even if auto-restart is enabled.

This is similar to what you will see when the
nodeagent determines the appserver to be non-responsive:
PidWaiter W ADML0063W: Cannot contact server "appserver".
Force to stop this server if it is still running.

This is similar to what you expect to see when the nodeagent
tries to restart the non-responsive server.
(The problem is when you don't see this)
PidWaiter A ADML0064I: Restarting unreachable server
"appserver".
Local fix Problem summary
****************************************************************
* USERS AFFECTED: Websphere Application server version         *
*                 5.1.1 and 5.0.2 users of a Network           *
*                 Deployment environment who have              *
*                 auto-restart set to true.                    *
****************************************************************
* PROBLEM DESCRIPTION: NodeAgent is stopping an unresponsive   *
*                      Application server but not re-starting  *
*                      it even though Auto-restart is set to   *
*                      true.                                   *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
Nodeagent stops the unresponsive Application Server.

:09:15:417 GMT+08:00] 33d43fbb PidWaiter     >
handleNotification
                                 j2ee.state.stopping
3:09:40:907 GMT+08:00] 536fbfbb PidWaiter     d Pid 348348:
Broke out of secondary wait
[12/7/05 23:09:40:907 GMT+08:00] 536fbfbb PidWaiter     >
finishRunProcessing
                                 348348: true
[12/7/05 23:09:40:908 GMT+08:00] 536fbfbb PidWaiter     d Pid
348348: Process stopped normally, so remove process from child
list

But after stopping, NodeAgent is not re-starting the server.

The problem is that the nodeagent stops the server and it gets
the notification  "j2ee.state.stopping" from that server. Then
it thinks that it is a  normal shutdown and does not re-start
the server.
Problem conclusion
We have made code changes such that if Nodeagent stops the
un-responsive Application Server then it will re-start it,
even if it receives the j2ee.state.stopping notification.

The fix for this APAR is currently targeted for
inclusion in cumulative fixes 5.0.2.16 and 5.1.1.10.
Please refer to the Recommended Updates page for delivery
information:

http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
Temporary fix Comments
APAR information
APAR number PK16645
Reported component name WAS NETWRK DEPL
Reported component ID 5630A3601
Reported release 00A
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Special Attention NoSpecatt
Submitted date 2005-12-13
Closed date 2006-01-23
Last modified date 2006-03-29

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros
ADMIN JMX        

Publications Referenced

Fix information
Fixed component name WAS NETWRK DEPL
Fixed component ID 5630A3601

Applicable component levels
R003 PSY    UP
R00A PSY    UP
R00H PSY    UP
R00I PSY    UP
R00P PSY    UP
R00S PSY    UP
R00W PSY    UP
R103 PSY    UP
R10A PSY    UP
R10H PSY    UP
R10I PSY    UP
R10P PSY    UP
R10S PSY    UP
R10W PSY    UP


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > General
Operating system(s):
Software version: 00A
Software edition:
Reference #: PK16645
IBM Group: Software Group
Modified date: Mar 29, 2006