PQ80564: Adminserver not reconnecting to running application servers when adminserver is killed and then recycled by nanny process.

 A fix is available

PQ87712; 4.0.5-4.0.7: Application Server processes become orphans after restart



APAR status
Closed as program error.

Error description
The customer has disableAutoStart of the application servers set
to true in the admin.config file, and NodeStartupState set to
LastState for the application server. When they issue a kill -9
on the adminserver, the nanny restarts the adminserver, but the
adminserver does not reconnect to the running appserver
processes, instead, it creates an orphan process.  In the
console, the appserver appears to be stopped, but you
can't start the appserver and a force stop/stop does not kill
the process id.  The process id does not go away until you issue
a kill on that id.  If the customer sets disableAutoStart to
false then the adminserver is able to reconnect to a running
appserver after it is killed and the nanny recycles, however the
customer desires that whe they bring up the adminserver manually
after or after a castastrophic failure on the node, that the
appservers are not running or in an orphan state.  This problem
was recreated on the HP-UX lab machines here in Raleigh and also
the Level 3 SM team was able to recreate this problem on Solaris
on WAS 4.06.  We find however on Windows and AIX this is working
properly.  Level 3 informs this code that needs to be changed in
native code and they are aware that this problem needs to be
addressed.
Local fix Problem summary
****************************************************************
* USERS AFFECTED: WebSphere Application Server users that      *
*                 have the AdminServer relaunched by Nanny.    *
****************************************************************
* PROBLEM DESCRIPTION: Relaunched AdminServer failing to       *
*                      reconnect to running AppServers         *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
The Nanny relaunches the AdminServer if it comes down.  It has
been noticed that under certain circumstances the relaunched
AdminServer can not connect to running AppServers.  This fix
is for fixing this problem.
Problem conclusion
When the Node startup state is stopped the AdminServer
reconnect code was not checking its state explicitly.
Now we check this state and attempt reconnection.
Temporary fix
ZE Fix Error 
PQ87712 04/04/19
The test fix was placed in 
pq99999 for download.
Has been tried and verified by PMR opener.
Comments
The fix was sent to the PMR openers.  They are satisfied
with the fix after their testing.
APAR information
APAR number PQ80564
Reported component name WEBPSHERE AE HP
Reported component ID 5630A2203
Reported release 400
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Submitted date 2003-11-06
Closed date 2003-12-09
Last modified date 2004-04-20

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros
Admin          

SRLS

Fix information

Applicable component levels
R400 PSY    UP


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > General
Operating system(s):
Software version: 400
Software edition:
Reference #: PQ80564
IBM Group: Software Group
Modified date: Apr 20, 2004