PQ85546: After a timeout in the admin console, DMgr servant came down correctly, but the new one hang during the initialization.

 A fix is available

Obtain the fix for this APAR



APAR status
Closed as program error.

Error description
Do to a timeout in the admin console, Deployment Manager Servant
went down and a new one was started, but it never initialized
completely.  The server had full java tracing on *=all=enabled.
A console dump of the address space showed the following.
In the servant, the following TCB was active - going outbound to
the controller.  This was also last TCB in the servant's job
output that was doing something.
 .
Traceback for this TCB:
BBOOSOUT
ORB_Request::comm_outbound_request()
CORBA::Request::invoke()
ORBEJSBridge::invoke_request(JNIEnv_*,bboojorb*,char*,unsigned
ORBEJSBridge::build_and_invoke_request(JNIEnv_*,bboojorb*,char*,
Java_com_ibm_ws390_orb_ClientDelegate_jorbInvokeRequest
com/ibm/ws390/orb/ClientDelegate.jorbInvokeRequest(I[BIZI)[B
com/ibm/ws390/orb/ClientDelegate.jorbInvokeRequest
com/ibm/ws390/orb/ClientDelegate.invoke
org/omg/CORBA/portable/ObjectImpl._invoke
com/ibm/ws/management/_ControlAdminService_Stub.activateProxyMBe
com/ibm/ws/management/MBeanFactoryImpl.completeActivateMBeans
 .
After running JFormat against the dump, The following deadlock
was shown in the Controller preventing the Servant from
continuing initialization.  It is between 'normal' AlarmManager
code and the AlarmManager Tracing code (running because full
java tracing *=all=enabled was enabled):
 .
 Thread 0x7c2bf8 "Alarm : 3"
    is waiting to be notified for:
      (0x22e1e6e0) "com/ibm/ws/util/BinaryHeap"
       which is owned by:
    Thread 0x7dd968 "Alarm Manager"
      which is waiting to be notified for:
        (0x20aafd30) "com/ibm/ejs/ras/Ws390TraceEventGenerator"
         which is owned by:
      Thread 0x7c2bf8 "Alarm : 3"
 .
Traceback for TCB: 7c2bf8
CEEOPCW
pthread_cond_wait
condWait
sysMonitorWait
lkMonitorEnter
com/ibm/ejs/util/am/AlarmManager.cancel
com/ibm/ejs/util/am/_Alarm.cancel
com/ibm/ws/management/event/AbstractPushRemoteSender.cancelAlarm
com/ibm/ws/management/event/AbstractPushRemoteSender.flushNotifi
com/ibm/ws/management/event/AbstractPushRemoteSender.handleNotif
com/ibm/ws/management/event/WsNotifBroadcaster.emitNotifications
com/ibm/ws/management/event/WsNotifBroadcaster.handleNotificatio
com/ibm/ws/management/event/WsNotifDelegator.handleNotification
 .
Traceback for TCB:
CEEOPCW
pthread_cond_wait
condWait
sysMonitorWait
lkMonitorEnter
com/ibm/ejs/ras/Ws390TraceEventGenerator.fireTraceEvent(Lcom/ibm
  /ejs/ras/TraceEvent;)V
com/ibm/ejs/ras/Ws390TraceEventGenerator.fireTraceEvent(ILcom/ib
  m/ejs/ras/TraceComponent;Ljava/lang/String;Ljava/lang/String;L
com/ibm/ejs/ras/Tr.entry(Lcom/ibm/ejs/ras/TraceComponent;Ljava/l
  ang/String;Ljava/lang/Object;)V
com/ibm/ws/util/BinaryHeap.heapify
com/ibm/ws/util/BinaryHeap.deleteMin
com/ibm/ejs/util/am/RBAlarmManagerThread.run
Local fix
Be more specific about which tracing you want to enable.
*=all=enabled trace specification should be avoided.
---
In addition, the deadlock was not reproduceable in the lab until
the LPAR was given multiple CPUs. Running with only ONE CPU in
the LPAR reduces the chance of hitting this deadlock.
Problem summary
****************************************************************
* USERS AFFECTED: All users of WebSphere Application Server    *
*                 V5.0 for z/OS                                *
****************************************************************
* PROBLEM DESCRIPTION: When attempting to save a master        *
*                      configuration via the Administrative    *
*                      Console,  a timeout condition was       *
*                      detected. When trace support is         *
*                      enabled to help determine the cause     *
*                      of the timeout, a deadlock condition    *
*                      was encountered.                        *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
As part of a master configuration save operation with trace
enabled, an audit trace which was cut triggered, the invocation
of a RAS MBean. Because the
trace support is listener based, those entities wishing to
be notified upon a trace event are signalled. Trace event
processing is controlled by method level synchronization.
Once invoked, the RAS MBean signalled its listeners, one
of which, when notified, attempted to cancel an Alarm
Object with the AlarmManager. Alarm Object processing is
controled by object level synchronization. At the same time,
the Alarm Manager attempted to cut a trace during its normal
processing and ran afoul of the trace synchronization. Thus
the deadlock presented itself.
Problem conclusion
Removed the method level synchronization from within the Trace
support.

APAR PQ85546 is associated with SERVICE LEVEL W502014 of
WebSphere Application Server V5.0 for z/OS.
Temporary fix Comments
APAR information
APAR number PQ85546
Reported component name WEBSPHERE FOR Z
Reported component ID 5655I3500
Reported release 500
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Special Attention NoSpecatt
Submitted date 2004-03-04
Closed date 2004-08-05
Last modified date 2004-09-03

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:
PQ89467

Modules/Macros
BBOUBINF          

Publications Referenced

Fix information
Fixed component name WEBSPHERE FOR Z
Fixed component ID 5655I3500

Applicable component levels
R500 PSY UQ91441    UP04/08/23 P F408

  Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.


Document Information


Current web document: swg1PQ85546.html
Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server for z/OS
Operating system(s):
Software version: 500
Software edition:
Reference #: PQ85546
IBM Group: Software Group
Modified date: Sep 3, 2004