PQ59006: WHEN AN APP SERVER FAILS, FOR WHATEVER REASON, THE ADMIN SERVER DETECTS THIS, BUT ONLY WILL TRY A RESTART ONCE

 A fix is available

System Management Component Cumulative Fix for 4.0.2/4.0.3/4.0.4 /4.0.5



APAR status
Closed as program error.

Error description
When an app server fails, for whatever reason, the admin server
detects its failure and attempts to restart it, once. If the
attempt fails for some reason,in this case the repository DB was
down, there are no more further "automatic" retries to restart.
This PMR is a request to add logic to admin server such that
"failed" app servers can be restarted.
Local fix
restart app server
Problem summary
****************************************************************
* USERS AFFECTED: WebSphere Application Server users trying    *
*                 to restart the application server.           *
****************************************************************
* PROBLEM DESCRIPTION: Under some circumstances, admin server  *
*                      may not restart failed application      *
*                      server.                                 *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
If an Application Server is detected to go down, the Admin
Server tries to start it again. If this attempt fails (for
example, if the repository database is down), no more
attempts are made to restart the Application Server.
Problem conclusion
This fix makes the Admin Server retry to start the
Application Server a specified number of times. The default
is to retry 5 times in 60 second intervals. This can be
modified by adding lines into admin.config as follows (note:
lines had to be wrapped due to formatting - in admin.config,
each of these should be on a single line with no embedded
whitespace):

com.ibm.ejs.sm.adminServer.nannyThread.maxRetries=
<num_retries>

com.ibm.ejs.sm.adminServer.nannyThread.waitTime=
<secs_between_retries>

The app server start will retry forever if a negative
number is set as <num_retries>.
Temporary fix
PQ59006_eFix_AEServer.jar and README.txt are available on:

wasdoc0.raleigh.ibm.com/apars/PQ59006/4.0.2
Comments
APAR information
APAR number PQ59006
Reported component name WEBSPHERE AE NT
Reported component ID 5630A2201
Reported release 400
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Submitted date 2002-03-13
Closed date 2002-04-09
Last modified date 2002-04-09

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros
ADMINSVR          

SRLS

Fix information
Fixed component name WEBSPHERE AE NT
Fixed component ID 5630A2201

Applicable component levels
R400 PSY    UP


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > General
Operating system(s):
Software version: 400
Software edition:
Reference #: PQ59006
IBM Group: Software Group
Modified date: Apr 9, 2002