PQ67826: NETWORK IS UNREACHABLE BECAUSE THER IS NO SO_KEEP_ALIVE OPTION FOR THE ORB

 Fixes are available

4.0.6: WebSphere Application Server Version 4.0 Fix Pack 6
PQ67826; 4.0.4: so_keepalive option added to the ORB



APAR status
Closed as program error.

Error description
Currently, ORB team has provided a test patch to customer to
enable
socket's so_keep_alive feature to detect the network failure
problem to
when user detaches a network cable. With or without socket's
so_keep_alive enabled, ORB application is unable to receive any
IOException back from socket object used in AIX JDK 1.3.1.  (for
WAS404)
Local fix
Code change
Problem summary
****************************************************************
* USERS AFFECTED: All WebSphere Application Server users of    *
*                 WLM client for quicker failover.             *
****************************************************************
* PROBLEM DESCRIPTION: WebSphere client (WLMed or non-WLMed)   *
*                      is either slow or unable to detect a    *
*                      broken network when a peer connection   *
*                      is unreachable.                         *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
When using two AIX machines, one running the client, the other
running the server, if the user unplugs the networking cable on
the client box, the client application could hang for a long
time.  In this typical failure scenario, the user has a WLMed
client and multiple backend EJB servers.  When one of the EJB
servers is unreachable, the WLMed client will be hung for a long
time before switching to a working clone.  To fix this problem,
apply this efix to allow the socket object to have so_keepalive
enabled.  In addition to this fix, also set -tcp_keepidle to
'no' in the JVM and TCP/IP layer to allow the socket timeout to
take effect.  Also, change the TCP/IP timeout to a smaller
value.

This efix is applicable to all platforms although it is
originally reported by AIX user.
Problem conclusion
When a remote process is unreachable, WAS might hang for a long
period of time and not be able to switch to a new WLM server.
This efix will add setKeepAlive method for sockets used for
RMI-IIOP connection and allows the socket to throw an IO
Exception back to the ORB component, and ORB, in turn, throws
CORBA.Comm_Failure back to its caller (ie, WLM).
Temporary fix Comments
APAR information
APAR number PQ67826
Reported component name WEBSPHERE AE AI
Reported component ID 5630A2200
Reported release 400
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Submitted date 2002-11-01
Closed date 2002-11-14
Last modified date 2002-11-14

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros
ORB          

SRLS

Fix information
Fixed component name WEBSPHERE AE AI
Fixed component ID 5630A2200

Applicable component levels
R400 PSY    UP


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > General
Operating system(s):
Software version: 400
Software edition:
Reference #: PQ67826
IBM Group: Software Group
Modified date: Nov 14, 2002