PQ70391: PID OF APP SERVER IS NOT GOING AWAY WHEN APP SERVER DOES NOT FULL INITIALIZED LEADING TO LARGE NUMBER OF GHOST PID AND SYSTEM CR

 Fixes are available

4.0.6: WebSphere Application Server Version 4.0 Fix Pack 6
PQ78892; 4.0.7: Admin server not recognize restarted Application Server process



APAR status
Closed as program error.

Error description
PID of app server is not going away after the app server did not
fully come up. This lead to large number of PID for the failing
 app server and cause the opsys to crash.
Local fix Problem summary
****************************************************************
* USERS AFFECTED: All users of WebSphere Application           *
*                 Server 4.0                                   *
****************************************************************
* PROBLEM DESCRIPTION: When an application server fails        *
*                      during initialization, the admin        *
*                      server may restart it continuously.     *
*                      The user will see large numbers of      *
*                      app servers in output of "ps".          *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
The distinguishing characteristics of this problem would be
large number of instances of same app server appearing in the
system process table.  These are NOT zombie or defunct
processes - they real processes using real resources.  The
only way to terminate this behavior is to stop the admin
server.
Problem conclusion
There was a subtle problem in the way that WebSphere admin
server was handling restart logic for an app server that
would lead to a truly exponential growth in number of
servers  started.  Added logic to keep track of whether a
server start was in progress and, if another request came in
to start an already started server, we now recognize that and
terminate the thread that would have gone on to propagate
more and more server instances.
This does not mean that you cannot restart a server - only
that we will not automatically start one if a copy of that
server is already running.
Temporary fix
PQ70391_Test.jar and readme.txt submitted to:

pq99999.raleigh.ibm.com
Comments
APAR information
APAR number PQ70391
Reported component name WEBSPHERE AE SO
Reported component ID 5630A2202
Reported release 400
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Submitted date 2003-01-29
Closed date 2003-04-03
Last modified date 2003-04-03

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros
AdminSVR          

SRLS

Fix information

Applicable component levels
R400 PSY    UP


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > General
Operating system(s):
Software version: 400
Software edition:
Reference #: PQ70391
IBM Group: Software Group
Modified date: Apr 3, 2003