PQ57190: ENHANCEMENT TO HARDWARE FAILOVER DETECTION

APAR status
Closed as program error.

Error description
The HTTP Transport is loaded under a parent HTTP Process which
spawns multiple threads.  Each HTTP Server thread will call the
HTTP Transport to determine if a request is intended for
Websphere, once that determination is made, the HTTP Transport
will select an AppServer to service the request.  If the
AppServer selected is unavailable due to a hardware failure, the
HTTP request is directed to another AppServer.  The
unresponsive AppServer is flagged as unavailable. After a Retry
interval has expired, the HTTP Transport attempts to connect to
the failed AppServer.
.
The detection of a failed AppServer by a single thread has
minimal impact on performance, however when multiple threads go
through the same process of discovering a failed AppServer Node,
the performance impact can be significant.  This APAR ensures
that only 1 HTTP Transport thread attempts to re-connect to a
previously failed AppServer, minimizing the perfomance impact of
hardware failures.
Local fix
No workaround exists.
Problem summary
****************************************************************
* USERS AFFECTED: WebSphere Application Server version 4.0.0,  *
*                 4.0.1, and 4.0.2 users who use the webserver *
*                 plugins.                                     *
****************************************************************
* PROBLEM DESCRIPTION: All webserver threads get stuck trying  *
*                      to see if a backend app server has come *
*                      back up.  As a result no new work       *
*                      coming in can be handled.               *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
The plugin did not attempt to rediscover a downed clone
with just one of the webserver threads.  If the clone could
not be contacted to determine if the port was up or down
all threads could end up stuck waiting for the connect to
timeout.
Problem conclusion
The plugin now only uses one of the webserver threads to see
if a downed clone has come back up.  As a result, the other
threads are free to handle incoming requests and only use the
app servers that are known to be up.
Temporary fix Comments
APAR information
APAR number PQ57190
Reported component name WEBSPHERE AE SO
Reported component ID 5630A2202
Reported release 400
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Submitted date 2002-01-28
Closed date 2002-02-20
Last modified date 2002-02-20

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros
PLUGIN          

Fix information
Fixed component name WEBSPHERE AE SO
Fixed component ID 5630A2202

Applicable component levels
R400 PSY    UP


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > General
Operating system(s):
Software version: 400
Software edition:
Reference #: PQ57190
IBM Group: Software Group
Modified date: Feb 20, 2002