Generic Term | Replace with |
TMP_PATH | A temporary directory with a minimum of 10 GB of free space (e.g. /large_fs). |
MM-DD | The current month and day (e.g. ,01-31). |
PMR | The full IBM PMR number (e.g. , PMR12345.b678.c000). |
JAVA_PID LOG_FILE NEW_PATH |
The process id of the active Java process (e.g. use "ps" command to check the PID column to identify the process). The actual name of the log file. An actual path to a new location for the log files |
A. To determine whether or not you should manually run the commands to gather the minimal set of data for the IBM representative to analyze:
# vmstat -lI 1 25
*The '-l' (lower case l) flag displays an extra "large-page" section
*The '-I' (capital i) flag displays I/O oriented view columns
B. Enable Java verbose garbage collection (GC)
Add the Java command line options:
-verbose:gc
-Xverbosegclog:/TMP_PATH/gc.log (e.g., /tmp/gc.log)
to your Java command line or process startup profile/script. This will require the process to be restarted
C. Redirect or save standard error (stderr) messages to a file
Commonly used application servers may already save standard out and standard error messages to a log file (e.g., SystemOut.log native_stdout.log, SystemErr.log, native_stderr.log) or to the application log file.
For custom applications, redirect the standard error messages by appending "2>&/TMP_PATH/LOG_FILE" or to redirect both the stdout and stderr to a file append ">/TMP_PATH/LOG_FILE 2>&1".
D. Relogin, then restart your application
Perform the following actions inorder for the changes to take effect:
- Stop the application (and node agent/manager, if applicable)
- Relogin as the USERID used in Step 1.B
- Confirm that full core is enabled and the new ulimits are in effect by executing the commands:
# ulimit -a
# lsattr -D -c sys -a fullcore -H
- Restart the application (e.g., node agent/manager) from the new login session
A. Example:
# vmstat -lI 1 25
System Configuration: lcpu=56 mem=65536MB
kthr memory page faults cpu large-page
----- ----------- ------------- ------------ ----------- -----------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa alp flp
0 0 1415347 13044604 0 0 0 0 0 0 46 193 153 45 44 0 0 0 0
0 0 1415350 13044601 0 0 0 0 0 0 25 411 311 77 18 0 4 0 0
0 0 1415350 13044601 0 0 0 0 0 0 43 281 298 25 74 0 1 0 0
0 0 1415350 13044601 0 0 0 0 0 0 30 62 262 33 66 0 3 0 0
0 0 1415350 13044601 0 0 0 0 0 0 32 403 324 62 37 0 1 0 0
0 0 1415350 13044601 0 0 0 0 0 0 35 193 275 40 59 0 2 0 0
0 0 1415350 13044601 0 0 0 0 0 0 40 145 265 79 20 0 0 0 0
0 0 1415350 13044601 0 0 0 0 0 0 28 59 262 65 34 0 1 0 0
0 0 1415350 13044601 0 0 0 0 0 0 55 61 280 23 76 0 1 0 0
0 0 1415350 13044601 0 0 0 0 0 0 48 63 261 99 0 0 1 0 0
........
NOTES:
- If the idle and wait ('id' and 'wa') columns have very low numbers, your system is in a high CPU utilization situaiton.
(The example above shows a CPU constrained system)
- Look at the largest value for avm as reported by the vmstat command. Multiply that by 4 K to get the number of bytes and then compare that to the number of bytes of RAM on the system. Ideally, avm should be smaller than total RAM. If not, some amount of virtual memory paging will occur. How much paging occurs will depend on the difference between the two values.
(The example above shows the system is not constrained on memory/paging)
B. When the full core dump options are not enabled and the core dumps are uploaded, in most cases, the core dumps will be incomplete or truncated. Not setting these options will prevent the support specialist from analyzing the data and will also delay the resolution of the reported issue.
When using J2E (or J2EE) application servers such as IBM WebSphere or Oracle WebLogic, for the changes to take effect, both the node agent (manager) and the application (manager) servers have to be stopped and restarted (and relogin before restarting)..
Examples of commands to be executed:
# chdev -l sys0 -a fullcore=true
# chuser fsize=-1 data=-1 core=-1 wasadmin
The AIX core file is generated in the current working directory of the process. Use the AIX environment variables:
IBM_COREDIR=NEW_PATH
to specify an alternate location for the AIX (process) core dump. Likewise, use the AIX environment variable:
IBM_JAVACOREDIR=NEW_PATH
to specify an alternate location for the javacore.*.txt files.
Both the IBM_COREDIR and IBM_JAVACOREDIR variables have to be configured for the process prior to the it being started (i.e., as part of its startup procedure and the process has to be restarted).
If the 'vmstat -lI 1 25' command does NOT report an issue of a resource contrained system, then you can use use the technote titled "IBM Java for AIX MustGather: Data collection procedure for high CPU utilization with Java applications" for a less manual and more automated data gathering process.
http://www-01.ibm.com/support/docview.wss?uid=isg3T1022749
Example:
# vmstat -lI 1 25
System Configuration: lcpu=56 mem=65536MB
kthr memory page faults cpu large-page
-------- ----------- ------------------------ ------------ ----------- -----------
r b p avm fre fi fo pi po fr sr in sy cs us sy id wa alp flp
0 0 0 1426827 13032304 0 0 0 0 0 0 38 1987 1334 0 0 99 0 0 0
0 0 0 1426827 13032304 0 0 0 0 0 0 28 391 310 0 0 99 0 0 0
0 0 0 1426827 13032304 0 0 0 0 0 0 25 447 337 0 0 99 0 0 0
0 0 0 1426827 13032304 0 0 0 0 0 0 19 225 285 0 0 99 0 0 0
0 0 0 1426827 13032303 0 0 0 0 0 0 48 346 312 0 0 99 0 0 0
0 0 0 1426827 13032304 0 0 0 0 0 0 50 401 318 0 0 99 0 0 0
0 0 0 1426827 13032304 0 0 0 0 0 0 49 487 436 0 0 99 0 0 0
0 0 0 1426827 13032304 0 0 0 0 0 0 72 1752 1356 0 0 99 0 0 0
0 0 0 1426843 13032288 0 0 0 0 0 0 57 2390 314 0 0 99 0 0 0
0 0 0 1426843 13032288 0 0 0 0 0 0 55 116 267 0 0 99 0 0 0
........
NOTE: The example above does NOT show any resource constrictions
If the 'vmstat -lI 1 25' command DOES report an issue of a resource contrained system, then using the more automated data gathering process could cause even more of an issue.
The 'perfpmr' and 'pdump' scripts contain many of the commands in this document, but they also contain many other commnds such as system / kernel traces which can cause even more of a performance issue on your system.
In these cases, you can use the following commands to gather the minimal, yet pertinent information to begin the troubleshooting of the JVM's issue with respect to the resource contrained system.
A. Run the following commands in the order shown, prior to experiencing the issue.
# mkdir -p /TMP_PATH/PMR/MM-DD/data/perf
# cd /TMP_PATH/PMR/MM-DD/data/perf
# netstat -Aan > netstat.out 2>&1
# vmstat -tl 1 >> vmstat.out 2>&1 &
*Make sure to use the '&' at the end of the 'vmstat' command, as we need this command to continuously run in the background.
The 'vmstat' command is not resource intensive and has little, if any, affect on the system.
B. Run the following commands in the order shown, at the time you are experiencing the issue:
# tprof -skeuj -x sleep 10
# gencore JAVA-PID core.001
# kill -3 JAVA_PID
# sleep 5
# kill -3 JAVA_PID
# sleep 5
# kill -3 JAVA_PID
# prtconf > prtconf.out 2>&1
# lslpp -hac > lslpp-hac.out 2>&1
# emgr -lv3 > emgr-lv3.out 2>&1
# ipcs -saPrX > ipcs.out 2>&1
# errpt -a > errpt-a.out 2>&1
# JAVA_HOME/bin/java -version > java-version.out 2>&1
# lparstat -th
# lparstat -tH
# lparstat -ti
# mpstat -wh
# mpstat -ws
# mpstat -wd
# mpstat -wv
Note: The -v flag is available only for POWER8 processors, and later.
# ps avwwwg > ps-all.out 2>&1
# ps -Xemo THREAD > ps-THREAD.out 2>&1
# svmon -P
# svmon -G -O unit=auto,timestamp=on,pgsz=on,affinity=detail > svmon-G.out 2>&1
*Running the 'kill -3 JAVA_PID' command will result in javacore.*.txt files located in the current working directory of the process.
# cd {location of javacore.*.txt files}
# cp javacore.*.txt TMP_PATH/PMR/MM-DD/data
NOTE: Running the 'kill -3 JAVA_PID' command will result in javacore.*.txt files located in the current working directory of the process.
Other files to place in 'TMP_PATH/PMR/MM-DD/data' for upload:
- Standard error (stderr)
- Standard output (stdout)
- SystemOut
- SystemErr
- Application logs
- GC log
- Verbose JIT log (if available)
- Any other logs generated
The following files are mandatory when uploading the package of data:
- prtconf.out
- lslpp-hac.out
- emgr-lv3.out
- ipcs.out
- errpt-a.out
- java-version.out
- vmstat.out
- lparstat-d.out
- lparstat-me.out
- lparstat-h.out
- lparstat-H.out
- lparstat-i.out
- mpstat-h.out
- mpstat-s.out
- mpstat-d.out
- mpstat-v.out (if using a p8 server)
- ps-all.out
- ps-THREAD.out
- svmon-P.pid.out
- svmon-G.out
- sleep.prof
- javacore.*.txt files
- Standard error (stderr)
- Standard output (stdout)
- SystemOut
- SystemErr
- Application logs
- GC log
- Any other logs generated
NOTE: These output files are mandatory to troubleshoot the issue
Package all of the data that has been gathered:
# cd TMP_PATH/PMR/MM-DD
# tar -cvf - data | gzip -c > PMR.MM-DD.tgz
Upload the packaged data to IBM secured servers using one of upload options provided on the "IBM Java for AIX MustGather: How to upload diagnostic data and testcases to IBM" web page:
http://www-01.ibm.com/support/docview.wss?uid=isg3T1022619
Document Type: | Instruction |
Content Type: | Troubleshooting |
Hardware: | all Power |
Operating System: | AIX 6 | AIX 7 |
IBM Java: | all Java Versions |
Author(s): | Christopher C. D. Peters |
Reviewer(s): | Rama Tenjarla |