PQ87224: In a multi-node environment the Deployment Manager only discovers one node agent.

 Fixes are available

5.1.1.17: WebSphere Application Server V5.1.1 Cumulative Fix 17 for AIX
5.1.1.17: WebSphere Application Server V5.1.1 Cumulative Fix 17 for HP-UX
5.1.1.19: WebSphere Application Server V5.1.1 Cumulative Fix 19 for Linux
5.1.1.16: WebSphere Application Server V5.1.1 Cumulative Fix 16 for AIX
5.1.1.18: WebSphere Application Server V5.1.1 Cumulative Fix 18 for AIX
5.1.1.18: WebSphere Application Server V5.1.1 Cumulative Fix 18 for HP-UX
5.1.1.18: WebSphere Application Server V5.1.1 Cumulative Fix 18 for Solaris
5.1.1.18: WebSphere Application Server V5.1.1 Cumulative Fix 18 for Windows
5.1.1.18: WebSphere Application Server V5.1.1 Cumulative Fix 18 for Linux
5.1.1.17: WebSphere Application Server V5.1.1 Cumulative Fix 17 for Linux
5.1.1.17: WebSphere Application Server V5.1.1 Cumulative Fix 17 for Solaris
5.1.1.17: WebSphere Application Server V5.1.1 Cumulative Fix 17 for Windows
5.0.2.7: WebSphere Application Server Express 5.0.2 Cumulative Fix 7
5.0.2.17: WebSphere Application Server 5.0.2 Cumulative Fix 17 for Solaris
5.0.2.17: WebSphere Application Server 5.0.2 Cumulative Fix 17 for Windows
5.0.2.14: WebSphere Application Server 5.0.2 Cumulative Fix 14 for Solaris
5.1.1.10: WebSphere Application Server V5.1.1 Cumulative Fix 10 for Windows
5.1.1.10: WebSphere Application Server V5.1.1 Cumulative Fix 10 for AIX
5.1.1.8: WebSphere Application Server 5.1.1 Cumulative Fix 8 for AIX
5.0.2.14: WebSphere Application Server 5.0.2 Cumulative Fix 14 for Linux
5.1.1.19: WebSphere Application Server V5.1.1 Cumulative Fix 19 for AIX
5.1.1.8: WebSphere Application Server 5.1.1 Cumulative Fix 8 for Windows
5.1.1.19: WebSphere Application Server V5.1.1 Cumulative Fix 19 for Windows
5.1.1.8: WebSphere Application Server 5.1.1 Cumulative Fix 8 for HP-UX
5.1.1.9: WebSphere Application Server V5.1.1 Cumulative Fix 9 for Solaris
5.1.1.8: WebSphere Application Server 5.1.1 Cumulative Fix 8 for Solaris
5.0.2.15: WebSphere Application Server 5.0.2 Cumulative Fix 15 for Windows
5.0.2.15: WebSphere Application Server 5.0.2 Cumulative Fix 15 for Solaris
5.0.2.15: WebSphere Application Server 5.0.2 Cumulative Fix 15 for AIX
5.1.1.9: WebSphere Application Server V5.1.1 Cumulative Fix 9 for AIX
5.0.2.15: WebSphere Application Server 5.0.2 Cumulative Fix 15 for Linux
5.0.2.12: WebSphere Application Server 5.0.2 Cumulative Fix 12
5.1.1.6: WebSphere Application Server Version 5.1.1 Cumulative Fix 6
5.1.1.7: WebSphere Application Server Version 5.1.1 Cumulative Fix 7
5.0.2.14: WebSphere Application Server 5.0.2 Cumulative Fix 14 for HP-UX
5.0.2.14: WebSphere Application Server 5.0.2 Cumulative Fix 14 for AIX
5.1.1.4: WebSphere Application Server Version 5.1.1 Cumulative Fix 4
5.1.1.9: WebSphere Application Server V5.1.1 Cumulative Fix 9 for Windows
5.0.2.17: WebSphere Application Server 5.0.2 Cumulative Fix 17 for HP-UX
5.0.2.17: WebSphere Application Server 5.0.2 Cumulative Fix 17 for AIX
5.1.1.11: WebSphere Application Server V5.1.1 Cumulative Fix 11 for AIX
5.0.2.17: WebSphere Application Server 5.0.2 Cumulative Fix 17 for Linux
5.1.1.10: WebSphere Application Server V5.1.1 Cumulative Fix 10 for HP-UX
5.1.1.10: WebSphere Application Server V5.1.1 Cumulative Fix 10 for Linux
5.1.1.9: WebSphere Application Server V5.1.1 Cumulative Fix 9 for HP-UX
5.1.1.9: WebSphere Application Server V5.1.1 Cumulative Fix 9 for Linux
5.0.2.16: WebSphere Application Server 5.0.2 Cumulative Fix 16 for HP-UX
5.1.1.12: WebSphere Application Server V5.1.1 Cumulative Fix 12 for Windows
5.0.2.16: WebSphere Application Server 5.0.2 Cumulative Fix 16 for Solaris
5.0.2.8: WebSphere Application Server V5.0.2 Cumulative Fix 8
5.0.2.16: WebSphere Application Server 5.0.2 Cumulative Fix 16 for Windows
5.0.2.16: WebSphere Application Server 5.0.2 Cumulative Fix 16 for AIX
5.1.1.11: WebSphere Application Server V5.1.1 Cumulative Fix 11 for Windows
5.1.1.16: WebSphere Application Server V5.1.1 Cumulative Fix 16 for Solaris
5.0.2.18: WebSphere Application Server 5.0.2 Cumulative Fix 18 for Solaris
5.1.1.11: WebSphere Application Server V5.1.1 Cumulative Fix 11 for Linux
5.0.2.18: WebSphere Application Server 5.0.2 Cumulative Fix 18 for Windows
5.0.2.18: WebSphere Application Server 5.0.2 Cumulative Fix 18 for HP-UX
5.0.2.18: WebSphere Application Server 5.0.2 Cumulative Fix 18 for AIX
5.1.1.16: WebSphere Application Server V5.1.1 Cumulative Fix 16 for Windows
5.1.1.14: WebSphere Application Server V5.1.1 Cumulative Fix 14 for Solaris
5.1.1.1: WebSphere Application Server Express 5.1.1 Cumulative Fix 1
5.0.2.14: WebSphere Application Server 5.0.2 Cumulative Fix 14 for Windows
5.1.1.12: WebSphere Application Server V5.1.1 Cumulative Fix 12 for AIX
5.1.1.12: WebSphere Application Server V5.1.1 Cumulative Fix 12 for Linux
5.1.1.12: WebSphere Application Server V5.1.1 Cumulative Fix 12 for HP-UX
5.1.1.12: WebSphere Application Server V5.1.1 Cumulative Fix 12 for Solaris
5.1.1.11: WebSphere Application Server V5.1.1 Cumulative Fix 11 for Solaris
5.1.1.13: WebSphere Application Server V5.1.1 Cumulative Fix 13 for AIX
5.1.1.13: WebSphere Application Server V5.1.1 Cumulative Fix 13 for Windows
5.0.2.13: WebSphere Application Server 5.0.2 Cumulative Fix 13
5.1.1.13: WebSphere Application Server V5.1.1 Cumulative Fix 13 for HP-UX
5.1.1.15: WebSphere Application Server V5.1.1 Cumulative Fix 15 for Solaris
5.1.1.13: WebSphere Application Server V5.1.1 Cumulative Fix 13 for Solaris
5.1.1.13: WebSphere Application Server V5.1.1 Cumulative Fix 13 for Linux
5.1.1.14: WebSphere Application Server V5.1.1 Cumulative Fix 14 for AIX
5.1.1.14: WebSphere Application Server V5.1.1 Cumulative Fix 14 for Linux
5.1.1.14: WebSphere Application Server V5.1.1 Cumulative Fix 14 for Windows
5.1.1.15: WebSphere Application Server V5.1.1 Cumulative Fix 15 for Windows
5.0.2.18: WebSphere Application Server 5.0.2 Cumulative Fix 18 for Linux
5.1.1.11: WebSphere Application Server V5.1.1 Cumulative Fix 11 for HP-UX
5.1.1.14: WebSphere Application Server V5.1.1 Cumulative Fix 14 for HP-UX
5.1.1.8: WebSphere Application Server 5.1.1 Cumulative Fix 8 for Linux
5.0.2.15: WebSphere Application Server 5.0.2 Cumulative Fix 15 for HP-UX
5.0.2.16: WebSphere Application Server 5.0.2 Cumulative Fix 16 for Linux
5.1.1.10: WebSphere Application Server V5.1.1 Cumulative Fix 10 for Solaris
5.1.1.15: WebSphere Application Server V5.1.1 Cumulative Fix 15 for AIX
5.1.1.15: WebSphere Application Server V5.1.1 Cumulative Fix 15 for HP-UX
5.1.1.16: WebSphere Application Server V5.1.1 Cumulative Fix 16 for HP-UX
5.1.1.16: WebSphere Application Server V5.1.1 Cumulative Fix 16 for Linux
5.1.1.15: WebSphere Application Server V5.1.1 Cumulative Fix 15 for Linux
5.1.1.1: WebSphere Application Server Version 5.1.1 Cumulative Fix 1
5.1.1.19: WebSphere Application Server V5.1.1 Cumulative Fix 19 for HP-UX



APAR status
Closed as program error.

Error description
The Deployment Manager (dmgr) only discovers one Node Agent when
it's in a multi-node environment.  The problem is
intermittent and will usually disappear if the discovery
related trace is enabled. It also is not specific to any
particular Node Agent and can be seen to discover Node Agents in
a random fashion.

Topology Example:
Machine 1(AIX 5.1)->  DevNode1 (Running Cell Manager and Node
                                Agent 1)
Machine 2(AIX 5.1)->  DevNode2 (Running Node Agent 2)

L2 analysis:

-- The trace file shows the following error:
[2/24/04 16:38:15:272 CST] 129ccaea  d UOW=
source=com.ibm.ws.management.discovery.DiscoveryAdapter org=IBM
prod=WebSphere component=Application Server event --
unrecognized response

It seems the dmgr is not able to recognize the response from a
Node Agent, which prevented the dmgr to discover the node

-- The cause of this "unrecognized response" points to the
queryId in the jxta:DiscoveryResponse message, as the
Step_2_FindTheSource folder/readme.txt suggests:

In the failing case, traces had the same query IDs:
[2/24/04 20:53:01:761 CST] 497c6891 DiscoveryServ d End-2-end
messaging:  queryId: 1
[2/24/04 20:53:01:761 CST] 2c706891 DiscoveryServ d End-2-end
messaging:  queryId: 1
In the successful case, traces had different query IDs:
[2/24/04 20:17:14:625 CST] 7f51e4a0 DiscoveryServ d End-2-end
messaging:  queryId: 1
[2/24/04 20:17:14:625 CST] 5cf1a4a0 DiscoveryServ d End-2-end
messaging:  queryId: 2

For some reason, the queryId was not being incremented in the
faling case.

-- The class that is responsible for incrementing the query Id
is:

   com.ibm.ws.management.discovery.DiscoveryService the method
   is: sendQuery()

The following line of code in sendQuery() may be causing the
problem:

   long queryId = querySerialNumber++;

Note querySerialNumber is a static private long type.  In the
traces, we see two threads invoking sendQuery() method.  We
probably are seeing a non-threadsafe implementation here with
the static querySerialNumber field.

This also explains why turning on traces the problem does not
occur. Since both thread writes to the same log file, they need
to wait their turns to write to the log and are foced to run in
a more synchronzied fashion.
Local fix Problem summary
****************************************************************
* USERS AFFECTED: All WebSphere Application Server users who   *
*                 are using a multi-node system.               *
****************************************************************
* PROBLEM DESCRIPTION: Dmgr start up process is not able to    *
*                      recognize a node agent in a two node    *
*                      environment                             *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
Dmgr startup process is not able to recognize a node agent in
a two node environment.  The problem disappears if discovery
related traces are turned on.  This problem appears
intermittently and not on one specific node.
Problem conclusion
A non-thread safe querySerialNumber was being used which
caused problems when there are two threads invoking the
sendQuery() method.  We are now using a thread safe
implementation of this value.
Temporary fix
Sent to customer.
Comments
APAR information
APAR number PQ87224
Reported component name WAS BASE 5.0
Reported component ID 5630A3600
Reported release 00A
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Special Attention NoSpecatt
Submitted date 2004-04-06
Closed date 2004-06-28
Last modified date 2004-07-08

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros
ADMINJMX          

Publications Referenced

Fix information

Applicable component levels
R003 PSY    UP
R00A PSY    UP
R00H PSY    UP
R00I PSY    UP
R00P PSY    UP
R00S PSY    UP
R00W PSY    UP
R103 PSY    UP
R10A PSY    UP
R10H PSY    UP
R10I PSY    UP
R10P PSY    UP
R10S PSY    UP
R10W PSY    UP


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > General
Operating system(s):
Software version: 00A
Software edition:
Reference #: PQ87224
IBM Group: Software Group
Modified date: Jul 8, 2004