MustGather information for web server child process crashes

The documentation required to diagnose child process crashes includes

If core dumps are not being saved for the child process crashes, the first step is to perform any necessary operating system and web server configuration so that core dumps are saved. Core dump configuration information is described here.

When a core dump is available, the ServerDoc tool provided with ihsdiag automates much of the work of gathering and formatting the required documentation. The user runs ServerDoc and provides the IHS installation directory and the path to the core file, and ServerDoc creates a new directory to hold the required documentation, and stores information in that new directory.

Once the ServerDoc tool has completed, the user should copy any remaining log files and configuration files used by the web server and the plug-in into the new directory, and send in the directory to IBM support.

Note: If IBM HTTP Server has been upgraded to a newer maintenance level since the core dump was generated, the core dump needs to be reproduced with the new level of product code. Otherwise, the crash information will be incorrect since the core dump and the product won't match.

In addition to submitting the documentation described below, we also recommend enabling mod_whatkilledus and mod_backtrace so that key information about each subsequent crash is recorded in the web server error log. This provides additional insight into the crashes without requiring that the steps outlined in this document be followed for each and every crash.

These modules are not supported with very old maintenance levels of IBM HTTP Server. Check the Supported server versions section in the documentation for each module to confirm that the module works with your level of IBM HTTP Server.

known issues to check for first

LoadBalance is set to RANDOM in plugin-cfg.xml and one of the following levels of WebSphere plug-in is used: 5.1.1.13, 5.1.1.14, 6.0.2.17, 6.0.2.19, 6.1.0.5, or 6.1.0.7

These levels of the plug-in have an issue with random load balancing which can cause crashes with IBM HTTP Server. The crash can be in arbitrary code and may not consistently occur in the same place. To resolve this known problem, either change LoadBalance to a different value or apply the plug-in fix for APAR PK43752, which is targeted for 5.1.1.15, 6.0.2.21, and 6.1.0.9.

Solaris 10

Make sure required Solaris AF_UNIX fixes have been applied, using one of the patches below or equivalent:

mod_dav crash on AIX 5.2

If crashes occur after apachectl restart or apachectl graceful on AIX 5.2, check for the following LoadModule directives in the configuration file (uncommented):

LoadModule dav_module modules/mod_dav.so
...
LoadModule dav_fs_module modules/mod_dav_fs.so

A child process crash can occur after a web server restart on AIX 5.2 if these are enabled.

AIX APAR IY78080 resolves the problem for AIX 5.3 This APAR fix is not available for AIX 5.2, so one of the configuration changes described above must be used.

SIGBUS crash on Linux and AIX

The most common cause of a SIGBUS crash on these platforms is that a file is truncated while the web server is trying to send it to a client. Some file replacement methods cause the existing file to be truncated and then the new contents written, instead of writing the new contents to a temporary file and then renaming to the proper name.

If you have static files served from IHS which can be modified in place, try EnableMMap Off to see if the problem is resolved.

Note: On Solaris, many other types of crashes result in SIGBUS.

z/OS

For U40xx or S0C4 abend in LE CELQLIB at httpd child process termination, check for applicability of LE APAR PK34252.

Crashes with mod_php on Unix platforms

The PHP manual recommends against using PHP in a multithreaded web server; see "Why shouldn't I use Apache2 with a threaded MPM in a production environment?".

IHS 2.0.42 and higher is multithreaded on all platforms. (IHS 1.3 is multithreaded only on Windows or with certain third-party modules.)

Thread safety problems in PHP applications or third-party libraries referenced by PHP can cause crashes in a threaded web server. The recommended solution is to configure PHP as a FastCGI application and use mod_fastcgi to communicate with it.

Crashes on Linux Platforms with ThreadsPerChild > 200

On Linux, child process crashes can occur due to address space exhaustion when large numbers of threads are used with the default thread stack size.

A thread stack size of 128KB is sufficient for IBM HTTP Server and the WebSphere plug-in; however, the system default is typically 8MB or larger. With the system default and large values for ThreadsPerChild, most of the address space can be consumed by thread stacks. For example, with ThreadsPerChild set to 512 and a stack size of 8MB, 2GB of the address space will be consumed by thread stacks. Memory allocations during request processing can then exceed the address space limit, typically 3GB, and result in crashes in arbitrary components of the webserver.

The system default can be displayed by ulimit -s (or 8MB if the value is 'unlimited')

With high values for ThreadsPerChild, the ThreadStackSize directive should be used to specify a much smaller stack size, as in the following example:

# Default to 128Kb stack size
ThreadStackSize 131072

Third-party modules may require a larger thread stack size. We recommend setting it to 256KB when third-party modules are used, unless the vendor is able to specify the exact requirement.

Crash during stop or restart processing on HP-UX/PA-RISC

(IBM HTTP Server 2.0 and above)

If a crash occurs while processing apachectl stop, apachectl graceful, or apachectl restart and

then the crash may be resolved by reversing the order of the LoadModule directives for mod_ibm_ssl and the WebSphere plug-in:

  1. Comment out the existing directive LoadModule ibm_ssl_module modules/mod_ibm_ssl.so
  2. Add that directive (LoadModule ibm_ssl_module modules/mod_ibm_ssl.so) to the bottom of httpd.conf

This problem is resolved with plugin APAR PK57529.

what we expect to learn from this information

A core dump and related information is critical for diagnosing the cause of child process crashes. Without the information, IBM support is limited to suggesting that the customer move to the current level of fixes. With the information, IBM support anticipates being able to make the following initial determination:

In cases where an IBM component crashed, the information often contains enough information to address the root cause of previously unknown problems. Even when the root cause cannot be determined from a particular core dump, the information is used to decide the next step.

In cases where a third-party component crashed, the vendor of that component will need to investigate further; IBM support is unable to diagnose problems in third-party components.

making sure required support programs are available

Please refer to these instructions for verifying that required support programs are installed.

running the tool

Run the tool as root to avoid any permissions problems with reading the core file or other files, such as log files and configuration files. (More information about the requirement to run this tool as root is available here.)

ServerDoc is passed three parameters for gathering crash documentation:

  1. GatherCrashDoc
  2. the name of the IHS installation directory (e.g., /usr/HTTPServer)
  3. the name of the core file (e.g., /tmp/core)
# java -jar ServerDoc.jar GatherCrashDoc /path/to/IHS /path/to/corefile

The tool creates a new directory which contains a timestamp in the name, and the crash documentation will be saved in that directory.

a sample run

For this example, IHS is installed in /usr/HTTPServer, the core dump was written to /tmp/core, and ihsdiag was unpacked into /root/ihsdiag-1.1.0

# cd /tmp
# java -jar /root/ihsdiag-1.1.0/ServerDoc.jar GatherCrashDoc \
/usr/HTTPServer /tmp/core
Reports, log files, and configuration files have been saved to directory
  CrashDoc.200404121310
If you have additional log files or configuration files, copy them there
before packing up the directory.

Hint for packing up the directory:
  tar -cf CrashDoc.200404121310.tar CrashDoc.200404121310
  gzip CrashDoc.200404121310.tar
# ls -l CrashDoc.200404121310/
total 8136
-rw-r--r--   1 root  system       8779 Apr 12 13:10 access_log
-rw-r--r--   1 root  system       7094 Apr 12 13:10 apachectl
-rw-r--r--   1 root  system    3593703 Apr 12 13:10 core
-rw-r--r--   1 root  system     478483 Apr 12 13:10 core_file_strings
-rw-r--r--   1 root  system      14419 Apr 12 13:10 error_log
-rw-r--r--   1 root  system      37141 Apr 12 13:10 httpd.conf
-rw-r--r--   1 root  system       7500 Apr 12 13:10 log
-rw-r--r--   1 root  system        173 Apr 12 13:10 report

copying other web server and plug-in files

The next step is to copy any other web server or plug-in configuration files and logs into the new CrashDoc directory. Here is a list of files to copy if they are being used:

saving the documentation directory

The last step is to pack up and compress the documentation directory using zip, tar followed by gzip, or tar followed by compress. The easiest way is to cut and paste the messages displayed by ServerDoc previously which showed the commands to use. The suggested commands will vary by platform. On z/OS, for example, compress will be suggested instead of gzip.

a sample run

# tar -cf CrashDoc.200404121310.tar CrashDoc.200404121310
# gzip CrashDoc.200404121310.tar

The resulting compressed file is the file to send to IBM support.

understanding the root requirement

When gathering information on web server crashes, the tool must be able to read core files created for web server processes and web server logs and configuration files. Often the web server logs and configuration files are readable by normal user ids, but core files are readable only by root or by the web server user id (e.g., nobody or www).

If the web server is started as root, the permissions on generated core files and log files and configuration files can be changed to allow a non-root user to run the crash documentation tool.

If the web server is not started as root, there are no such concerns, and the crash documentation tool may be run by the user id which starts the web server.

If the tool is run as non-root and it is unable to gather the required information, permissions on the core file or other files can be changed and the tool may be run again. It may not be possible to determine if this problem occurred until the documentation has been analyzed by IBM HTTP Server support.