PQ85546: After a timeout in the admin console, DMgr servant came down correctly, but the new one hang during the initialization. | |||||||||||||||||||||||||||||||||||||||||||
![]() |
|||||||||||||||||||||||||||||||||||||||||||
![]() APAR status Closed as program error. Error description Do to a timeout in the admin console, Deployment Manager Servant went down and a new one was started, but it never initialized completely. The server had full java tracing on *=all=enabled. A console dump of the address space showed the following. In the servant, the following TCB was active - going outbound to the controller. This was also last TCB in the servant's job output that was doing something. . Traceback for this TCB: BBOOSOUT ORB_Request::comm_outbound_request() CORBA::Request::invoke() ORBEJSBridge::invoke_request(JNIEnv_*,bboojorb*,char*,unsigned ORBEJSBridge::build_and_invoke_request(JNIEnv_*,bboojorb*,char*, Java_com_ibm_ws390_orb_ClientDelegate_jorbInvokeRequest com/ibm/ws390/orb/ClientDelegate.jorbInvokeRequest(I[BIZI)[B com/ibm/ws390/orb/ClientDelegate.jorbInvokeRequest com/ibm/ws390/orb/ClientDelegate.invoke org/omg/CORBA/portable/ObjectImpl._invoke com/ibm/ws/management/_ControlAdminService_Stub.activateProxyMBe com/ibm/ws/management/MBeanFactoryImpl.completeActivateMBeans . After running JFormat against the dump, The following deadlock was shown in the Controller preventing the Servant from continuing initialization. It is between 'normal' AlarmManager code and the AlarmManager Tracing code (running because full java tracing *=all=enabled was enabled): . Thread 0x7c2bf8 "Alarm : 3" is waiting to be notified for: (0x22e1e6e0) "com/ibm/ws/util/BinaryHeap" which is owned by: Thread 0x7dd968 "Alarm Manager" which is waiting to be notified for: (0x20aafd30) "com/ibm/ejs/ras/Ws390TraceEventGenerator" which is owned by: Thread 0x7c2bf8 "Alarm : 3" . Traceback for TCB: 7c2bf8 CEEOPCW pthread_cond_wait condWait sysMonitorWait lkMonitorEnter com/ibm/ejs/util/am/AlarmManager.cancel com/ibm/ejs/util/am/_Alarm.cancel com/ibm/ws/management/event/AbstractPushRemoteSender.cancelAlarm com/ibm/ws/management/event/AbstractPushRemoteSender.flushNotifi com/ibm/ws/management/event/AbstractPushRemoteSender.handleNotif com/ibm/ws/management/event/WsNotifBroadcaster.emitNotifications com/ibm/ws/management/event/WsNotifBroadcaster.handleNotificatio com/ibm/ws/management/event/WsNotifDelegator.handleNotification . Traceback for TCB: CEEOPCW pthread_cond_wait condWait sysMonitorWait lkMonitorEnter com/ibm/ejs/ras/Ws390TraceEventGenerator.fireTraceEvent(Lcom/ibm /ejs/ras/TraceEvent;)V com/ibm/ejs/ras/Ws390TraceEventGenerator.fireTraceEvent(ILcom/ib m/ejs/ras/TraceComponent;Ljava/lang/String;Ljava/lang/String;L com/ibm/ejs/ras/Tr.entry(Lcom/ibm/ejs/ras/TraceComponent;Ljava/l ang/String;Ljava/lang/Object;)V com/ibm/ws/util/BinaryHeap.heapify com/ibm/ws/util/BinaryHeap.deleteMin com/ibm/ejs/util/am/RBAlarmManagerThread.runLocal fix Be more specific about which tracing you want to enable. *=all=enabled trace specification should be avoided. --- In addition, the deadlock was not reproduceable in the lab until the LPAR was given multiple CPUs. Running with only ONE CPU in the LPAR reduces the chance of hitting this deadlock.Problem summary **************************************************************** * USERS AFFECTED: All users of WebSphere Application Server * * V5.0 for z/OS * **************************************************************** * PROBLEM DESCRIPTION: When attempting to save a master * * configuration via the Administrative * * Console, a timeout condition was * * detected. When trace support is * * enabled to help determine the cause * * of the timeout, a deadlock condition * * was encountered. * **************************************************************** * RECOMMENDATION: * **************************************************************** As part of a master configuration save operation with trace enabled, an audit trace which was cut triggered, the invocation of a RAS MBean. Because the trace support is listener based, those entities wishing to be notified upon a trace event are signalled. Trace event processing is controlled by method level synchronization. Once invoked, the RAS MBean signalled its listeners, one of which, when notified, attempted to cancel an Alarm Object with the AlarmManager. Alarm Object processing is controled by object level synchronization. At the same time, the Alarm Manager attempted to cut a trace during its normal processing and ran afoul of the trace synchronization. Thus the deadlock presented itself.Problem conclusion Removed the method level synchronization from within the Trace support. APAR PQ85546 is associated with SERVICE LEVEL W502014 of WebSphere Application Server V5.0 for z/OS.Temporary fix Comments
APAR is sysrouted FROM one or more of the following: APAR is sysrouted TO one or more of the following: PQ89467 Modules/Macros
Publications Referenced
|
Document Information |
Current web document: swg1PQ85546.html
Product categories: Software > Application Servers >
Distributed Application & Web Servers > WebSphere Application
Server for z/OS
Operating system(s):
Software version: 500
Software edition:
Reference #: PQ85546
IBM Group: Software Group
Modified date: Sep 3, 2004
(C) Copyright IBM Corporation 2000, 2009. All Rights Reserved.