APAR status
Closed as program error.
Error description
From the trace, it was seen that the inprocess server has been
marked unusabled but getNextClone keeps returning this server.
Since the server is not available, COM_FAILURE is sent and then
retry again. This process keeps looping so the server takes a
long time to stop.
During quiesce, we'll allow out a request, and allow in a reply,
but in this particular instance, the outbound request to unbind
goes out to the network card, and loops back into the server.
So while its a same-ORb to same-Orb request, it appears as an
incoming remote request. These are blocked in quiesce. So, in
the workload controllers shut down of each regulator, we get
stuck on the unbind of the RMIConnector, and get into
(essentially) an inifinte request retry loop in the WLM
code. In this particular fixPak level the WLM code does not yet
employ, as it does in later fixPaks, a "retry limit", and so
with the COMM_FAIL (as seen in the original trace) WLM
continues to keep sending the request again and again. Since we
are stuck in this loop, this essentially locks out the quiesce
sequence (it never can exit - the default time to allow quiesce
to finish work is 3 minutes), and eventually leads to a
stack overflow (out of memory) and the server then stops/dies.
Local fix
No work around, test fix has been provided and it worked.
Problem summary
****************************************************************
* USERS AFFECTED: WebSphere Application Server users of WLM, *
* WorkloadManagement, or Clustering *
****************************************************************
* PROBLEM DESCRIPTION: Inifinte loop encountered on Server *
* shutdown *
****************************************************************
* RECOMMENDATION: *
****************************************************************
Inifinte loop encountered on Server shutdown
Problem conclusion
Code was added that allowed pseudo-remote IIOP outbound
traffic to re-entry originating server
The fix for this APAR is currently targeted for inclusion in
fixpack 5.0.2.7 and 5.1.1.3. Please refer to the Recommended
Updates page for delivery dates:
http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP
&uid=swg27004980
Temporary fix
NA
Comments
APAR information |
APAR number |
PQ98102 |
Reported component name |
WAS BASE 5.0 |
Reported component ID |
5630A3600 |
Reported release |
00A |
Status |
CLOSED PER |
PE |
NoPE |
HIPER |
NoHIPER |
Special Attention |
NoSpecatt |
Submitted date |
2004-12-08 |
Closed date |
2005-01-10 |
Last modified date |
2005-01-10 |
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Modules/Macros
Publications Referenced
Applicable component levels |
R003 PSY |
UP |
R00A PSY |
UP |
R00H PSY |
UP |
R00I PSY |
UP |
R00P PSY |
UP |
R00S PSY |
UP |
R00W PSY |
UP |
R103 PSY |
UP |
R10A PSY |
UP |
R10H PSY |
UP |
R10I PSY |
UP |
R10P PSY |
UP |
R10S PSY |
UP |
R10W PSY |
UP |
|