PQ79091: ADMINSERVER WILL NOT RESTART APPSERVER AFTER ISSUING A SIGTERM

 A fix is available

4.0.7: WebSphere Application Server Version 4.0 Fix Pack 7



APAR status
Closed as program error.

Error description
Customer is seeing various problems associated with WebSphere
admin server including appserver failures requiring complete
restart of WebSphere when app servers receive a SIGTERM.
Problems are intermittent and there appears to be an interaction
between WebSphere native code library
(libWsProcessManagement.so) and a
native code library in use by customer's app servers.   This may
be related to multi-threaded problems and/or timing windows in
WebSphere native code.
.
Symptom:
Problem with the adminserver not restarting the appserver in
two cases:
.
1) If the jvm from the appserver received a SIGTERM (kill -15),
and
2) if a native library is loaded into the jvm and that native
library has a signal handle for SIGTERM and causes and exit of 0
as part of its signal handling the adminserver was not able
to restart the appserver.
But if the native library is not loaded and jvm receives a
sigterm the adminserver recycles the appserver successfully.
Local fix
Provided customer with a modified version of the
libWsProcessManagement.so file that issues a SIGKILL instead of
a SIGTERM
Problem summary
****************************************************************
* USERS AFFECTED: All users of WebSphere Application Server.   *
****************************************************************
* PROBLEM DESCRIPTION: WebSphere admin server fails to         *
*                      terminate and restart a failing app     *
*                      server.  App server process remains.    *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
When an app server is failing (i.e., fails to send a ping to
the admin server), the admin server responds by sending a
SIGTERM to the app server process.  In the problem reported
here, a native code library in use by the app server was
intercepting the SIGTERM and not passing it on so that it
would lead to termination of the app server.
Problem conclusion
Changed the admin server code to send SIGKILL (which cannot be
caught or ignored) rather than SIGTERM.
Temporary fix
PQ70417 has been submitted on 
pq99999.raleigh.ibm.com
Comments
APAR information
APAR number PQ79091
Reported component name WAS ADVANCED SU
Reported component ID 5630A2200
Reported release 400
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Submitted date 2003-10-01
Closed date 2003-10-01
Last modified date 2003-10-01

APAR is sysrouted FROM one or more of the following:
PQ70417

APAR is sysrouted TO one or more of the following:

Modules/Macros
AdminSVR          

SRLS

Fix information
Fixed component name WAS ADVANCED SU
Fixed component ID 5630A2200

Applicable component levels
R400 PSY    UP


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > General
Operating system(s):
Software version: 400
Software edition:
Reference #: PQ79091
IBM Group: Software Group
Modified date: Oct 1, 2003