MustGather: 100% CPU usage on AIX
 Technote (troubleshooting)
 
Problem(Abstract)
Collecting data for 100% CPU usage problems with IBM® WebSphere® Application Server on the AIX® operating system. Gathering this information before calling IBM support will help familiarize you with the troubleshooting process and save you time.
 
Resolving the problem
The following documents what is needed to troubleshoot a Java™ Virtual Machine (JVM) that reaches 100% CPU utilization, or an unusually high percentage on an AIX system.

If you have already contacted support, continue on to the 100% CPU Usage MustGather information. Otherwise, click: MustGather: Read first for all WebSphere Application Server products.

The following instructions will help you set-up the failing system to capture required information:
  1. After starting the Application Server JVM, run the following dbx script for each Application Server or clone having the problem:

    ./dbxtrace_aix.sh -a PID_appserver > dbx_startup_clone_name.out

    Where the output of this script is directed to the specified dbx_startup_clone_name.out file.

  2. Follow instructions to Enable verbosegc for the failing application server.

  3. Edit the vmstat_script.sh and ps_script.sh scripts to modify the sleep interval. The default sleep interval is 5 minutes, which is fine if the problem can be recreated in a few hours. If it takes a week for the problem to occur, then increase the interval to how often the script overwrites the output file.

  4. Run the two scripts you modified in the preceding step:

    ./vmstat_script.sh  vmstat.out
    ./ps_script.sh  ps.out

    Note: The output files are specified at the end of the command.

  5. Clear all application server log files before starting the test. You might have to stop the application server to delete the files and then start them.

Collect the following information when the application server process is experiencing the problem:
  1. Get a listing of all of the open network connections using the following command:

    netstat -an > netstat1.out

  2. If the Web server is remote, then run the following command on the Web server system:

    netstat -an > netstatwebserver1.out

  3. Run the following script:

    ./tprof_ps.sh tprof_ps

    Note: In this case (unlike the preceding vmstat and ps_script scripts), the tprof_ps is a directory where files are written.

  4. Run the following command:

    kill -3 [PID_of_problem_JVM]

    Note: The kill -3 commands should create javacore.txt files in the working directory of the application server. The files for this command should be found in the install_root directory.

  5. Wait two minutes.

  6. Run the following command:

    kill -3 [PID_of_problem_JVM]

  7. Wait another two minutes.

  8. Run the following command:

    kill -3 [PID_of_problem_JVM]

  9. Wait a final two minutes.

  10. Gather the open network connections for the system (and on the remote Web server, if needed)

    netstat -an > netstat2.out
    netstat -an > netstatwebserver2.out  

  11. Finally, gather the dbxtrace output again for each clone:

    ./dbxtrace_aix.sh -a PID_appserver > dbx_startup_clone_name.out

    Where the output of this script is directed to the specified dbx_startup_clone_name.out file.

  12. Collect the following files:
    • For WebSphere Application Server V6.0:
      • All files in the following directory:

        profile_root/logs/server_name

      • A copy of the server.xml file located in the following directory:

        profile_root/config/cells/cell_name/nodes/
        node_name
        /servers/server_name

    • For WebSphere Application Server V5.0 and 5.1:
      • Include all of the files from the following directory:

        install_root/logs/server_name

      • A copy of the server.xml file located in the following directory:

        install_root/config/cells/cell_name/nodes/
        node_name
        /servers/server_name

    • For all releases of WebSphere Application Server
      • If the Web server is remote, send the http_plugin.log (V5.0 and V6.0) file from the Web server system.
      • dbxtrace.out
      • All javacore.txt files created
      • Output from vmstat_script.sh and ps_script.sh
      • All files generated by tprof_ps.sh script. These files will be in the directory specified as a script parameter.
      • All netstat*.out files

  13. Follow instructions to send diagnostic information to IBM support.
==========================================================================
In some cases, a more in-depth set of documents may be required. If instructed by your support representative, please collect a core file in addition to the javacores. A full core file can be generated by using the following command:

gencore <[PID_of_problem_JVM]> <filename>

==========================================================================

For a listing of all technotes, downloads, and educational materials specific to 100% CPU usage problems, search the WebSphere Application Server support site.
 
Related information
Enabling verbosegc in WebSphere Application Server
Steps to get support
MustGather: Readme first
Troubleshooting guide
Correlating TPROF output to a thread in JAVACORE
 
vmstat_script.shps_script.shtprof_ps.sh



dbxtrace_aix.sh
 
Cross Reference information
Segment Product Component Platform Version Edition
Application Servers WebSphere Application Server - Express Hangs/performance degradation AIX 6.0, 5.1, 5.0
Application Servers Runtimes for Java Technology Java SDK
 
 


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server > 100% CPU Usage
Operating system(s): AIX
Software version: 6.0
Software edition:
Reference #: 1116458
IBM Group: Software Group
Modified date: Aug 23, 2005