The documentation required to diagnose web server hangs includes
The ServerDoc tool provided with ihsdiag automates much of the work of gathering this information. The user runs ServerDoc and provides the IHS installation directory and other information; ServerDoc creates a new directory to hold the required documentation, and stores information in that new directory.
Once the ServerDoc tool has completed, the user should copy any remaining log files and configuration files used by the web server and the plug-in into the new directory, and send in the directory to IBM support.
For web server hangs with IHS 2.0 on Solaris, please see this document first.
For web server hangs with IHS 2.0.42 or greater and multiple Listen directives on AIX 4.3:
# /usr/sbin/instfix -vik IY42085 IY42085 Abstract: Application hang during semop Fileset bos.mp is not applied on the system. Fileset bos.up:4.3.3.91 is applied on the system. All filesets for IY42085 were found.
For web server hangs with IHS 2.0.42 or greater and multiple Listen directives on AIX 5.1:
# /usr/sbin/instfix -vik IY46223 IY46223 Abstract: Application hang during semop Fileset bos.mp:5.1.0.54 is applied on the system. Fileset bos.mp64:5.1.0.54 is applied on the system. Fileset bos.up:5.1.0.54 is applied on the system. All filesets for IY46223 were found.
For web server hangs with any release of IHS on AIX 5.2:
# /usr/sbin/instfix -vik IY46214 IY46214 Abstract: dropping partial connections leaves them on so_q0 Fileset bos.adt.include:5.2.0.13 is applied on the system. Fileset bos.mp:5.2.0.13 is applied on the system. Fileset bos.mp64:5.2.0.13 is applied on the system. Fileset bos.net.tcp.client:5.2.0.13 is applied on the system. Fileset bos.up:5.2.0.13 is applied on the system. All filesets for IY46214 were found.
IHS release | required e-fix or fix pack |
1.3.19.x | 1.3.19.6 plus PQ87084 or PQ90262 |
1.3.26.x | 1.3.26.2 plus PQ87084 or later |
1.3.28.x | 1.3.28.1 |
2.0.42.x | 2.0.42.2 plus PQ85834 (most e-fix packages for 2.0.42.x prereq this level anyway) or later |
2.0.47.x | 2.0.47.1 |
For web server hangs with IHS 2.0.42 or greater and multiple Listen directives on AIX 5.2:
# /usr/sbin/instfix -vik IY47284 IY47284 Abstract: Application hang during semop Fileset bos.mp:5.2.0.14 is applied on the system. Fileset bos.mp64:5.2.0.14 is applied on the system. Fileset bos.up:5.2.0.14 is applied on the system. All filesets for IY47284 were found.
For web server hangs with any release of IHS on AIX 5.3:
$ /usr/sbin/instfix -vik IY58143 IY58143 Abstract: Required fixes for AIX 5.3 Fileset X11.Dt.lib:5.3.0.1 is applied on the system. Fileset X11.Dt.rte:5.3.0.1 is applied on the system. Fileset X11.base.rte:5.3.0.1 is applied on the system. Fileset X11.fnt.ucs.ttf is not applied on the system. Fileset X11.fnt.ucs.ttf_extb:5.3.0.1 is applied on the system. ... Fileset devices.vdevice.hvterm1.rte:5.3.0.1 is applied on the system. Fileset devices.vtdev.scsi.rte:5.3.0.1 is applied on the system. Fileset sysmgt.websm.apps:5.3.0.1 is applied on the system. Fileset sysmgt.websm.framework:5.3.0.1 is applied on the system. Fileset sysmgt.websm.rte:5.3.0.1 is applied on the system. Fileset sysmgt.websm.webaccess:5.3.0.1 is applied on the system. All filesets for IY58143 were found.
If IHS 1.3.28.x is run on a system with GSKit 7.0.3.25 or higher,
and SSL is enabled with SSLEnable
, IHS can fail to close
client connections which have timed out, and in some cases the web
server can become completely unresponsive.
The following environment variable must be set to prevent IHS from hanging during connection timeouts (PK44754)
GSK_USE_SOC_MUTEX=OFF
export GSK_USE_SOC_MUTEX
This environment variable can be added to bin/apachectl or any customer-created script used to start IHS. Here is an example apachectl script which has been modified to set this environment variable. The added text is in red.
# When multiple arguments are given, only the error from the _last_ # one is reported. Run "apachectl help" for usage info # # # |||||||||||||||||||| START CONFIGURATION SECTION |||||||||||||||||||| # -------------------- -------------------- # # start of local modifications # Allow IBM HTTP Server 1.3.28.x to be used with GSKit >= 7.0.3.25 GSK_USE_SOC_MUTEX=OFF export GSK_USE_SOC_MUTEX # end of local modifications # the path to your PID file PIDFILE=/opt/13281/logs/httpd.pid # # the path to your httpd binary, including options if necessary HTTPD="/opt/13281/bin/httpd -d /opt/13281" #
The problem may be exposed by updating the GSKit level, either by installing or applying maintenance to another product.
IBM HTTP Server is not affected by this issue on Windows.
Common web server hang conditions can be categorized as follows:
As discussed in the following sections, the root cause of the problem may not reside in IHS, so analysis of the IHS hang documentation may indicate that a different type of information is necessary.
A primary use of IHS is as a front-end to the WebSphere Application Server. It is possible for applications running in WebSphere to have delayed reponse, or no response at all, so that all IHS threads are waiting for a application server response and no free IHS threads are available to handle new client connections.
Some authentication mechanisms for IHS, such as LDAP authentication capability provided with IHS or by a third party vendor, must contact a server over the network as part of IHS request processing. If that communication stalls, it is possible that after some time all IHS threads are waiting on an authentication response and no free IHS threads are available to handle new client connections.
In any situation where all IHS threads are waiting on an external application, the IHS hang documentation will show which component is waiting but it cannot determine the root cause for why the application is not responding.
Note: If the IHS hang documentation shows that IHS is waiting for a WebSphere response, related documentation for WebSphere will need to be gathered. Instructions for this WebSphere documentation can be found at http://www-1.ibm.com/support/search.wss?rs=180&tc=SSEQTP&tc1=SSCMPB9&q=mustgather. It is possible to collect this documentation at the same time when the IHS hang documentation is collected, so that the required WebSphere information is available to IBM support if it is necessary.
Vendors of third-party components which run inside IHS may provide similar information for gathering documentation on problems that can cause the component to hang or stall; contact the vendor for more information.
For this type of problem, IBM support anticipates being able to determine the failing component, as well as whether or not this is a known problem. Occasionally there are operating system issues which prevent IHS from finding out about new client connections. If analysis of the IHS hang documentation shows such a problem, network traces may be necessary and operating system support may suggest further diagnostic information.
Please refer to these instructions for verifying that required support programs are installed.
In some levels of AIX, backtraces cannot be obtained if IHS 1.3 is using the pthread accept mutex mechanism. The backtraces are critical for hang diagnosis. If the backtraces could not be collected, the cause of the hang cannot be diagnosed. Submit the hang documentation just in case, but check for the release-specific issues listed below to prepare for possible future occurrences of the hang.
AcceptMutex
fcntl
to your web server configuration file.AcceptMutex
fcntl
or AcceptMutex sysvsem
in your web server
configuration file at startup, prior to the hang occurring.AcceptMutex pthread
in your web
server configuration file.Other releases of IHS are not affected.
Run the tool as root
to avoid any permissions problems
with obtaining backtraces or reading files, such as log files and
configuration files. (More information about the requirement to run
this tool as root
is available here.)
ServerDoc is passed in four parameters for gathering hang documentation:
GatherHangDoc
# java -jar ServerDoc.jar GatherHangDoc /path/to/IHS 1398 127.0.0.1:80
The tool creates a new directory which contains a timestamp in the name, and the hang documentation will be saved in that directory.
If the IHS installation only supports SSL, then use - (hyphen) for this parameter. Otherwise, specify an IP address and port which can be used to reach the server from the local machine without using SSL.
Use the following table to determine the value of the non-SSL
address parameter based on the form of a non-SSL Listen
directive used in your configuration:
Listen directive looks like this | use this for address parameter |
(no non-SSL ports) | - |
Listen 80 |
127.0.0.1:80 |
Listen port |
127.0.0.1:port |
Listen 192.168.1.15:80 |
192.168.1.15:80 |
Listen ipaddress:port |
ipaddress:port |
Listen myhostname:80 |
myhostname:80 |
For this example, IHS is installed in /scratch/IHS
,
the parent process id is stored in file
/scratch/IHS/logs/httpd.pid
, the non-SSL port can be
reached from the web server machine on address
127.0.0.1:8080
, and ihsdiag was unpacked into directory
/root/ihsdiag-1.3.0
.
# cd /tmp # java -jar /root/ihsdiag-1.3.0/ServerDoc.jar GatherHangDoc \ /scratch/IHS `cat /scratch/IHS/logs/httpd.pid` 127.0.0.1:8080 Gathering doc on 4 web server processes... 5985 5986 5988 5984 Seconds remaining before gathering information again: 60...54...48...42...36...30...24...18...12...6... Gathering doc on 4 web server processes... 5985 5986 5988 5984 Seconds remaining before gathering information again: 30...27...24...21...18...15...12...9...6...3... Gathering doc on 4 web server processes... 5985 5986 5988 5984 Reports, log files, and configuration files have been saved to directory HangDoc.200408310607 If you have additional log files or configuration files, copy them there before packing up the directory. Web server log and conf files other than the default will have to be copied manually. WebSphere plug-in conf and log files will have to be copied manually. Hint for packing up the directory: tar -cf HangDoc.200408310607.tar HangDoc.200408310607 gzip HangDoc.200408310607.tar # ls -l HangDoc.200408310607/ total 772 -rw-rw-r-- 1 trawick trawick 0 Aug 31 06:07 access_log -rw-rw-r-- 1 trawick trawick 5358 Aug 31 06:07 apachectl -rw-rw-r-- 1 trawick trawick 118 Aug 31 06:07 error_log -rw-rw-r-- 1 trawick trawick 462978 Aug 31 06:07 httpd -rw-rw-r-- 1 trawick trawick 28790 Aug 31 06:07 httpd.conf -rw-rw-r-- 1 trawick trawick 255056 Aug 31 06:08 log -rw-rw-r-- 1 trawick trawick 56 Aug 31 06:07 redhat-release -rw-rw-r-- 1 trawick trawick 5453 Aug 31 06:08 report
There are two normal situations where the tool can take a long time to gather data:
A less frequent cause is that there is a problem in the tool which causes it to hang.
Two conditions will cause the display to be updated:
If you need to interrupt the tool so the web server can be restarted (to try to resolve the hang condition), the best place to interrupt it is when it is counting down the number of seconds until it checks the web server state again. The last lines of output on the display will look like this:
Seconds remaining before gathering information again: 60...54...48...42...36...30.
If the tool is interrupted at a different time, incomplete information will be gathered on the state of the web server. This will introduce some risk into our analysis of the problem, but as long as a meaningful percentage of the web server processes have been examined (>30%), it is usually possible to find a probable cause of the hang.
If the IHS child processes have a very large number of threads (e.g., ThreadsPerChild is higher than 200), the expected cause is that the system debugger has a performance degradation analyzing such processes.
It is also possible that the HangDoc tool has a problem interacting with the system debugger, and it will never finish.
To find out more information about the cause of the delay, take these steps:
ps
-ef
to a file. This must be done before interrupting the
HangDoc tool.
HangDoc.xxxx
directory, which is what it was using when
it stalled.
HangDoc.xxxx
directory to IHS support for analysis.
The next step is to copy any other web server or plug-in configuration files and logs into the new HangDoc directory. Here is a list of files to copy if they are being used:
The last step is to pack up and compress the documentation directory using zip, tar followed by gzip, or tar followed by compress. The easiest way is to cut and paste the messages displayed by ServerDoc previously which showed the commands to use. The suggested commands will vary by platform. On z/OS, for example, compress will be suggested instead of gzip.
# tar -cf HangDoc.200408310607.tar HangDoc.200408310607 # gzip HangDoc.200408310607.tar
The resulting compressed file is the file to send to IBM support.
root
requirementWhen gathering information on web server hangs, the tool must attach to live web server processes to obtain information about the state of those processes.
If the web server is started as
root
, then at least one of these processes will be owned
by root
and other processes will be owned by the web
server user id (e.g., nobody
or www
). Only
root
has the authority to attach to all of the processes,
so the tool itself must be run as root
. If the web
server administrator does not have authority to log in or switch user
to root
, a simple script can be created to gather the
hang documentation, and the system administrator can give the web
server administrator sudo
access to that script.
sudo
is a third-party tool available without cost for all
appropriate platforms.
If the web server is not started as root
, there are no
such concerns, and the hang documentation tool may be run by the user
id which starts the web server.
If the tool is run as non-root
and it is unable to
gather the required information, the problem will have to be
recreated. It may not be possible to determine if this problem
occurred until the documentation has been analyzed by IBM HTTP Server
support.