PQ70257: IN MULTI-NODE, CLIENTS HANG MUCH LONGER THAN NORMAL TCP/IP TIMEOUT IF UNEXPECTEDLY UNABLE TO COMMUNICATE TO REMOTE SERVER. | |||||||||||||||||||||||||||||||||||||||
![]() |
|||||||||||||||||||||||||||||||||||||||
![]() APAR status Closed as program error. Error description Description: In multinode environment, if adminserver or application server process on remote node unexpectedly becomes non-responsive (offline, hang, etc) browsing the topology using console, executing WSCP "list" or attempt to do XMLconfig export can cause the clients to hang for much longer time than normal operating system TCP/IP timeout . . Normally by default, TCP/IP timeout on Windows platform is about 1-2 minutes, Solaris and AIX about 10 minutes.Local fix Problem summary **************************************************************** * USERS AFFECTED: WebSphere Application Server 4.0 users * * who have applied a fix that included * * or supersedes PQ62333 (that is, any * * SM cumulative fix prior to the 02-07-03 * * version). * **************************************************************** * PROBLEM DESCRIPTION: In a multinode environment, with * * applications installed on server * * groups, if adminserver or application * * server process on remote node * * unexpectedly becomes non-responsive * * (offline, hang, etc) browsing the * * topology using console, executing WSCP * * "list" or attempt to do an XMLconfig * * export can cause the clients to hang * * for much longer time than normal * * operating system TCP/IP timeout. * **************************************************************** * RECOMMENDATION: * **************************************************************** Multinode failover scenarios fail : 1) if one of the nodes becomes unreachable, current client (such as the console) that is already running, hangs for more than normal TCP/IP timeout and sometimes does not recover from the hang. 2) if one of the nodes becomes unreachable, starting a new client hangs (for example, starting a new console comes up with no topology even after "console ready..." message.)Problem conclusion This fix ensures the following multinode failover scenarios pass: 1) if one of the nodes becomes unreachable, current client (such as the console) that is already running, recovers from a hang in a reasonable time (not much more than the normal TCP/IP timeout defined by the OS). 2) if one of the nodes becomes unreachable, starting a new client succeeds AFTER healthy node is able to resolve the hang caused by the downed node. (for example, starting a new console comes up with no topology). The hang time is determined by the transaction timeout (default 600 seconds). This fix also redefines "running" current state on the Module installed on ServerGroup: 1. In multinode server group environment, module current state is running, if **ANY** of the clones module is installed on is running. (Previous definition was *ALL* clones must be running). This updated definition is correct because as long as one of the clones is running, work load management should find a clone to service the request.Temporary fix Comments
APAR is sysrouted FROM one or more of the following: APAR is sysrouted TO one or more of the following: Modules/Macros
SRLS
|
Document Information |
Product categories: Software > Application Servers >
Distributed Application & Web Servers > WebSphere Application
Server > General
Operating system(s):
Software version: 400
Software edition:
Reference #: PQ70257
IBM Group: Software Group
Modified date: Feb 18, 2003
(C) Copyright IBM Corporation 2000, 2006. All Rights Reserved.