PK04318: KILLING NODEAGENT & SERVERS SIMULTAENOUSLY: WLM DOES NOT FAILBACK PROPERLY.

APAR status
Closed as program error.

Error description
The following steps describe the error scenario:
Machine 1:  J2EE client application
Machine 2:  Appserver 1
Machine 3:  Appserver 2
  Appservers 1 and 2 are clustered.

1) Bring up all systems in the cluster with a clean start.
2) Start the client in a view that requires RMI calls to server
EJB's
3) Allow the client to quiesce.
4) Kill the Java processes on machine 1
5) Perform an action on the client requiring RMI calls to server
EJB's.  This should suceed.
6) Reboot machine 1 and restart the nodeagent and servers on it.
7) Kill the Java processes on machine 2
8) Repeat the action in 5 -- This fails and a restart of the
client is required to continue.

The exception which might result is the following:

org.omg.CORBA.TRANSIENT: java.net.SocketException: Operation
timed out:
connect:could be due to invalid
address:host=wctnd006.notesdev.ibm.com,port=9900  vmcid: IBM
minor
code: E02  completed: No


The problem occurs because the epoch on the cluster descriptions
are not being updated when (after being killed) a node agent and
its server are restarted.
Local fix Problem summary
****************************************************************
* USERS AFFECTED: WebSphere Application Server users of WLM,   *
*                 WorkloadManagement, or Clustering            *
****************************************************************
* PROBLEM DESCRIPTION: When killing nodeagents and servers     *
*                      SIMULTAENOUSLY, WLM does not recover    *
*                      and fail back to the servers when       *
*                      they are restarted                      *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
when killing nodeagents and servers SIMULTAENOUSLY, WLM does
not recover and fail back to the servers, when they are
restarted. The problem occurs because the epoch on the cluster
descriptions are not being updated when (after being killed)
a node agent and its server are restarted.
Problem conclusion
Fixed this by ensuring the epoch is changed.

The fix for this APAR is currently targeted for inclusion
fixpack WBI 5.1.1.2, and is an iFix only for 5.0.2.X PME
Temporary fix Comments
APAR information
APAR number PK04318
Reported component name WAS ENTERPRISE
Reported component ID 5630A3700
Reported release 00A
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Special Attention NoSpecatt
Submitted date 2005-04-15
Closed date 2005-04-29
Last modified date 2005-04-29

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros
WLM          

Publications Referenced

Fix information
Fixed component name WAS ENTERPRISE
Fixed component ID 5630A3700

Applicable component levels
R003 PSY    UP
R00A PSY    UP
R00H PSY    UP
R00I PSY    UP
R00S PSY    UP
R00W PSY    UP


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > Enterprise Edition (EE)
Operating system(s):
Software version: 00A
Software edition:
Reference #: PK04318
IBM Group: Software Group
Modified date: Apr 29, 2005