MustGather: A hang occurs when attempting to stop WebSphere Application Server for z/OS
 Technote (troubleshooting)
 
Problem(Abstract)
MustGather for problems with IBM® WebSphere® Application Server hang problems when attempting to stop an Application Server or the Deployment Manager. Gathering the information before calling IBM support will help familiarize you with the troubleshooting process and save you time.
 
Resolving the problem
This MustGather covers the scenario where you have issued the stop command or tried to stop a server via the Administrative Console but the server you are requesting to stop is not coming down or hangs. There is a separate MustGather for the scenario when an Application Server is successfully processing requests and all of a sudden hangs. That MustGather is titled "MustGather: A hang occurs when running WebSphere Application Server for z/OS® which was previously processing requests".

MustGather information for the specific problem when a hang occurs trying to stop an Application Server or Deployment Manager:

If you have not contacted support click: MustGather: Read first for WebSphere Application Server for z/OS link.

During stop processing, the WebSphere Application Server for z/OS runtime will wait for all active requests to complete processing and then attempt to stop each thread. In almost all cases, when the stop command is not working, it is because

a) there is still at least one request that is actively processing

or

b) all requests have finished processing but some other product or application has created a non-daemon type thread in the Java™ space, in scenarios when trying to stop an Application Server.

or

c) notification communication with the Deployment Manager is stuck, in scenarios where you are trying to stop the Deployment Manager.

This MustGather information addresses situation "b)" and "c)" above. However, you can collect the same trace and console dump to diagnose situation "a)" above to see what request is still dispatched.

For scenario "b)" above, there are 2 types of Java threads, daemon and non-daemon. The daemon type threads are interruptible and can thus be taken down when WebSphere Application Server for z/OS runtime issues interrupt when trying to stop the server. The non-daemon type threads are not interruptible if the code running on the thread catches the interruption thrown, does nothing with it, and continues to wait. These kinds of non-daemon type threads prevent the WebSphere Application Server for z/OS runtime from cleanly shutting the JVM™ down because the JVM will wait until all threads in the Java space have been cleaned up in order for the JVM to end. Thus the server will wait forever for these non-daemon type threads to come down and the stop processing will hang.

For scenario "b)" above, with version 5 service levels W502020 (and above) and W510204 (and above) and all version 6 Fix Packs, some improved diagnostics were added into WebSphere Application Server for z/OS runtime method threadTerm in CommonBridge.java for identifying any non-daemon threads during stop processing. This MustGather explains how to enable this trace. Even if you are running with service levels lower than W502020 or W510204, you will not have the additional trace diagnostic, but the same trace and console dump needs to be collected.

  1. Obtain the trace then attempt to stop the server

    Ensure that the Trace output is being written to sysprint using this MVS modify command:
    f controller_region_name,tracetosysprint=yes

    Note: When setting the trace, case is important in the tracejava keyword. If using SDSF to enter commands, use the SDSF extended command line (by putting a slash, / , on the command line and press enter); otherwise SDSF makes the string uppercase.

    If an Application Server is not responding to the stop command, issue MVS modify commands to enable this trace:
    f controller_region_name,tracejava='com.ibm.ws390.orb.*=all=enabled'
    f controller_region_name,tracedetail=(3,4)

    If the Deployment Manager is not responding to the stop command, issue MVS modify commands to enable this trace:
    f deployment_manager_controller_region_name,tracejava='com.ibm.*=all=enabled'
    f node_agent_server_name,tracejava='com.ibm.*=all=enabled'

    Once the trace is enabled, issue the stop to the Controller Region via the MVS modify command, via the Administrative Console, or Scripting command. Use the same method of stop you were using when you first observed the hang.

    Make sure you see the stop command issued for the Controller Region and the server acknowledged it. You'll see the following message in the job output of the Controller Region:
    "BBOO0133I WEBSPHERE FOR Z/OS STOP COMMAND ISSUED FOR SERVER..."

    Let the trace run for 30 seconds.

    Issue the following MVS modify command to turn off the trace:

    For Application Server:
    f controller_region_name,traceinit

    For Deployment Manager and Node Agent:
    f deployment_manager_controller_region_name,traceinit
    f node_agent_server_name,traceinit

  2. Dump hanging address spaces

    If an Application Server is not responding to the stop command, take a console dump of the WebSphere Application Server Controller and Servant(s) that will not come down from the stop command:

    DUMP COMM=(Descriptive name for this WebSphere dump)
    R rn,SDATA=(ALLNUC,CSA,GRSQ,LPA,LSQA,PSA,RGN,SQA,SUM,SWA,TRT),CONT
    R rn,JOBNAME=(controller_region_name, servant_region_name),END

    If it is the Deployment Manager that will not come down from the stop command, ensure the Deployment Manager's Controller and Servant Regions are in the console dump, as well as the Node Agent, as well as OMVS:

    DUMP COMM=(Descriptive name for this WebSphere dump)
    R rn,SDATA=(ALLNUC,CSA,GRSQ,LPA,LSQA,PSA,RGN,SQA,SUM,SWA,TRT),CONT
    R rn,JOBNAME=(OMVS,dmgr_controller_region_name, dmgr_servant_region_name, node_agent_server_name),CONT
    R rn,DSPNAME=('OMVS'.*),END

    Ensure that any console dumps are not partial dumps. Verify that you see message IEA611I in the SYSLOG. This message indicates that a Complete dump was taken.
  3. Send documentation

    FTP the Control and Servant Region joblogs.
    FTP the tersed console dump(s) in binary format.

    Review the information in the following link before sending documentation to IBM:
    Submitting Diagnostic Information to IBM.
In the scenario where an Application Server is not responding to a stop command because of a non-daemon type thread, the trace will indicate which thread is the non-daemon one. The console dump is needed to get the callback stack of the thread to understand which product or application owns this thread. The following information explains how to find the thread and match it to a callback stack. You can do these diagnostic steps yourself or send the documentation in to IBM for support to assist you with the steps or go through the steps for you.

In the Servant Region trace output, for each thread, when the thread is being stopped by the WebSphere Application Server for z/OS runtime, you will see a trace that contains:
"CommonBridge.threadTerm, interrupting thread: ... is a Daemon thread"
or
"CommonBridge.threadTerm, interrupting thread: ... is not a Daemon thread"
Please note that one TCB will be issuing all these trace entries but the individual thread identifier is contained within a trace entry.

The ones to be concerned with are the entries that have:
"... is not a Daemon thread"

The trace will only show the Java thread identifier ("Thread-11" in example below), not a TCB. We need to use the console dump and jformat DIS LS to look under the "Thread Identifiers" section to map the Java thread identifier into a TCB. The jformat tool is a part of svcdump.jar which can be found
http://www-1.ibm.com/servers/eserver/zseries/zos/unix/bpxa1ty2.html
and is called "SVC Analyzer".

The following trace is an example showing a non-daemon type thread. The trace identifies "Thread-11"

Trace: 2005/06/02 17:04:39.273 01 t=9E2E88 c=UNK key=P8 (13007002)
FunctionName: com.ibm.ws390.orb.CommonBridge
SourceId: com.ibm.ws390.orb.CommonBridge
Category: DEBUG
ExtendedMessage: CommonBridge.threadTerm, interrupting thread: Thread-11, thread is not a Daemon thread

We then need to run jformat on the console dump to get the TCB associated with the thread identifier or thread name, i.e. "Thread-11". The jformat command is: jformat DIS LS . Then in the output, look under the "Thread Identifiers" section to see the thread name mapped to a TCB. Match the TCB to the callback stack in the Console dump by either looking at the thread callback stack output of svcdump.jar or from in IPCS, IP VERBX LEDATA 'asid(aaaa) NTHREADS(*)' where aaaa is the asid in hex of the hung Servant Region.

For a listing of all technotes search the WebSphere Application Server for z/OS support site.

 
 
 


Document Information


Current web document: swg21252202.html
Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server for z/OS > Hangs/Performance Degradation
Operating system(s): z/OS
Software version: 6.1
Software edition:
Reference #: 1252202
IBM Group: Software Group
Modified date: Dec 19, 2006