PQ70257: IN MULTI-NODE, CLIENTS HANG MUCH LONGER THAN NORMAL TCP/IP TIMEOUT IF UNEXPECTEDLY UNABLE TO COMMUNICATE TO REMOTE SERVER.

 Fixes are available

4.0.6: WebSphere Application Server Version 4.0 Fix Pack 6
PQ84906; V4.0.5-V4.0.7: Application remove operation fails with large app.
System Management Component Cumulative Fix for 4.0.2/4.0.3/4.0.4 /4.0.5



APAR status
Closed as program error.

Error description
Description:
In multinode environment, if adminserver or application server
process on remote node unexpectedly becomes non-responsive
(offline, hang, etc)  browsing the topology using console,
executing WSCP "list" or attempt to do  XMLconfig export  can
cause the clients to hang for much longer time than normal
operating system TCP/IP timeout .
.
Normally by default, TCP/IP timeout on Windows platform is about
1-2 minutes, Solaris and AIX about 10 minutes.
Local fix Problem summary
****************************************************************
* USERS AFFECTED: WebSphere Application Server 4.0 users       *
*                 who have applied a fix that included         *
*                 or supersedes 
PQ62333 (that is, any          *
*                 SM cumulative fix prior to the 02-07-03      *
*                 version).                                    *
****************************************************************
* PROBLEM DESCRIPTION: In a multinode environment, with        *
*                      applications installed on server        *
*                      groups, if adminserver or application   *
*                      server process on remote node           *
*                      unexpectedly becomes non-responsive     *
*                      (offline, hang, etc) browsing the       *
*                      topology using console, executing WSCP  *
*                      "list" or attempt to do an XMLconfig    *
*                      export can cause the clients to hang    *
*                      for much longer time than normal        *
*                      operating system TCP/IP timeout.        *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
Multinode failover scenarios fail :
1) if one of the nodes becomes unreachable, current client
(such as the console) that is already running, hangs for more
than normal TCP/IP timeout and sometimes does not recover from
the hang.
2) if one of the nodes becomes unreachable, starting a new
client hangs (for example, starting a new console comes up
with no topology even after "console ready..." message.)
Problem conclusion
This fix ensures the following multinode failover scenarios
pass:
1) if one of the nodes becomes unreachable, current client
(such as the console) that is already running, recovers from
a hang in a reasonable time (not much more than the normal
TCP/IP timeout defined by the OS).
2) if one of the nodes becomes unreachable, starting a new
client succeeds AFTER healthy node is able to resolve the
hang caused by the downed node.  (for example, starting a new
console comes up with no topology). The hang time is determined
by the transaction timeout (default 600 seconds).

This fix also redefines "running" current state on the Module
installed on ServerGroup:
1. In multinode server group environment, module current state
is running, if **ANY** of the clones module is installed on is
running.
(Previous definition was *ALL* clones must be running).
This updated definition is correct because as long as one of
the clones is running, work load management should find a
clone to service the request.
Temporary fix Comments
APAR information
APAR number PQ70257
Reported component name WEBSPHERE AE AI
Reported component ID 5630A2200
Reported release 400
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Submitted date 2003-01-24
Closed date 2003-02-18
Last modified date 2003-02-18

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros
ADMINSVR          

SRLS

Fix information
Fixed component name WEBSPHERE AE AI
Fixed component ID 5630A2200

Applicable component levels
R400 PSY    UP


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > General
Operating system(s):
Software version: 400
Software edition:
Reference #: PQ70257
IBM Group: Software Group
Modified date: Feb 18, 2003