Troubleshooting Guide

Determining Origins and Symptoms of Problems

If you are supporting an organization using DB2, you will receive calls from users to resolve a variety of problems. Your response depends on:

The severity of the problem
The specific nature of the problem
Any related information that you can gather
Your experience in resolving similar problems

To solve a problem, start by obtaining a comprehensive description of the problem. This way, you can begin to determine its origin. For example, a problem may exist in any of the following:

Hardware
Operating system
Networking system or other subsystem
DB2 server
DB2 client
DB2 Connect gateway to host systems

Determining the Location of a Problem

Most applications run in a client/server environment. You must determine if a problem is on the client, the server, or somewhere in between (that is, in the LAN or communication protocol stack).

Investigating where the problem is detected or reported is the best way to start. For example, if you receive an unexpected SQLCODE on a client, then investigate the SQLCODE on that client. (See Responding to Unexpected Messages or SQLCODEs for information.)

Often the SQLCODE alone provides enough information to determine the source and cause of the problem. If the SQLCODE does not give enough information to determine the source of a problem, examine the db2diag.log file at the partition server where the problem was reported. For example, if the problem was reported on a client, first look at the db2diag.log file on that particular client.

The db2diag.log file is an ASCII file written by DB2 that contains diagnostic information for DB2. The db2diag.log file reports exceptions encountered in DB2 code. If you know the date and time when the problem occurred, you can go directly to the corresponding db2diag.log file entries.
Note: The error messages relating to a user application problem most often do not cause exceptions in DB2. This type of problem is handled as a normal part of DB2 processing. As a result, they are not reported in the db2diag.log file.

For information on this important file, see First Failure Data Capture. When viewing the file, keep in mind that the most recent conditions are always at the end.

Responding to Unexpected Messages or SQLCODEs

When you receive an unexpected message or SQLCODE, follow these steps until you can determine the problem:

When you receive a message, take note of all available information, including the following:
- The code, an eight-digit alphanumeric message identification number. This code may begin with the prefix SQL, DBA or CCA. Also note all reason codes, return codes, and other information associated with the message returned.
- Any SQLSTATE received. SQLSTATEs are useful for diagnosing problems, because they are consistent across all platforms. For a list of SQLSTATEs, see the Message Reference.
- The text of the message (especially if the message does not include an identification number or a code).
- The SQLCA if available.
- Any action suggested in the message.
- Diagnostic files, such as the db2diag.log file. In addition, note any operating system diagnostic files such as traceback files, core files (for UNIX-based systems), event logs (for Windows NT), syslog files (for OS/2), and any dumps. See Part 2, Advanced DB2 Troubleshooting.
- The environment in which the message occurred. For example, what the user was doing at the time, the steps that led up to the problem, the type of operating system, applications that were running, and the communication protocol. Also note if any utilities such as RUNSTATS, REORG, LOAD, IMPORT, or others were being used.
- The SQL statement that encountered the error, and any preceding statements in the unit of work.
Check the online message help by typing db2 "? message" from the command prompt, where message is the complete SQLCODE, SQLSTATE, or message number. Read and follow the suggested actions.
Use the SQLCODE or message number to search available DB2 documentation for additional information.
If the problem persists, ensure that you have as much of the following information as possible before contacting DB2 Customer Support:
- The DB2 diagnostic log (db2diag.log file), and any trap or dump files it mentions. See First Failure Data Capture.
- The SQLCA structure from the db2diag.log file or syslog files, or as captured by an application. See Interpreting an SQLCA Structure.
If you determine that the problem is not with DB2 but with a vendor-supplied application, contact the vendor.

Responding to Abends

In this book the term abend includes:

Segmentation violations and general protection faults (GPFs) on Windows systems
Traps on OS/2
Exceptions on UNIX-based systems

When an abend occurs, work through the following steps until you can determine the problem:

Confirm that all DB2 components (clients, servers, DB2 Connect, each partition server in an Enterprise - Extended Edition system) are at the same service level, especially if a fix pack has recently been installed. See Updating DB2 Products.
Note the executable module that reported the abend.
If the problem persists, try to collect the following additional information before contacting DB2 Customer Support:
- Any logged information, in particular:
  - All the files in the diagpath as specified in the database manager configuration file.
  - The db2diag.log file and any trap or dump files it mentions. See First Failure Data Capture.
  - For UNIX, the syslog and core files. See Chapter 17, Diagnostic Tools for UNIX-Based Systems.
  - For OS/2, the syslog file. See Using the syslog Command.
  - For Windows operating systems, Dr. Watson logs, and, for Windows NT, the event log. See Chapter 19, Diagnostic Tools for Windows 32-bit Operating Systems.
- If the problem can be reproduced and you are willing to abend the system again, a trace on the client and server may be helpful. Follow the steps in Example of Tracing to a File.
See Part 2, Advanced DB2 Troubleshooting for information.
If you determine that the problem is not with DB2 but with a vendor-supplied application, contact the vendor.

Responding to Suspensions and Loops

When the system appears to be suspended or looping, try to identify the problem by working through the following steps:

Using messages, the db2diag.log file, and other information, attempt to determine why the suspension or loop occurred.
Some common problems that cause suspensions or loops can include the following:
- The operating system has run out of swap space or paging space.
- Applications are waiting on a lock, waiting for a database restart, or waiting for a response from a remote peer. To see if applications are still running, check for disk activity:
  - For UNIX-based systems, use iostat (see Other Commands)
  - For the Windows NT operating system, use its performance monitor (see Diagnostic Tools for Windows NT)
  Another way to see if applications are still active is to use database monitor snapshots. For additional information on database monitor snapshots, see the System Monitor Guide and Reference manual.
Recover the system:
- If the operating system is suspended (with no sign of disk activity), reboot the machine and check the db2diag.log file for problems. Also, check the hardware operating system reports; for example, on AIX, use errpt -a.
- If you can access the operating system but not the application:
  - Check the status of applications with the Control Center or the LIST APPLICATIONS FOR DATABASE database-alias command. The status information shows if applications are running (UOW Executing), waiting for a lock (Lock Wait), or for user input (UOW Waiting), rather than being suspended inside of the database manager.
  - Use a CPU monitor to check for applications that are using large amounts of CPU time, and then use your judgment to determine whether or not the applications are suspended or behaving as expected.
  - Check for disk activity using an operating system command such as iostat on AIX.
  - Check the db2diag.log file for DB2 problems.
  - On UNIX-based environments, work through the following steps until you can stop your DB2 instance:
    1. Stop the DB2 instance normally with db2stop
    2. Stop the DB2 instance and force any remaining applications with db2stop force
    There may be a time when you may not be able to stop the DB2 instance using the methods suggested above. Before performing any of the following steps collect as much information as possible. This may include:
    - Snapshots from each of the partition servers.
    - Use ps -ef on each partition server and save the results.
    - Use ipcs on each partition server and save the results.
    - Use db2_call_stack and save the results.
    After carrying out the above steps, work through the following steps:
    1. Abruptly kill the DB2 instance with db2stop -kill In an DB2 UDB Enterprise - Extended Edition (EEE) system, you can use db2_kill. Use db2kill in a single partition server environment.
    2. Use the kill command to terminate any DB2 agents that cannot be stopped
    3. Use the kill command to terminate DB2 by killing the db2syscs process.
    4. As a very last resort, reboot your entire system
    If you must use the kill command, ensure that all DB2 interprocess communications (IPC) resources are removed. Either:
    - For AIX, use the ipclean command (in the sqllib/misc directory)
      Note: The ipclean command might not remove all DB2 interprocess communications. User the ipcs command to check to see if any remain.
    - Use the following command to identify those resources that belong to the DB2 instance:
```
ipcs | grep [instance_id]
```
      Then use ipcrm to remove them.
If the problem persists, try to collect the following additional information before contacting DB2 Customer Support:
- Any information logged by DB2. See "First Failure Data Capture".
- If the system is suspended:
  - Set up a DB2 trace to dump output to a file. Follow the instructions in Example of Tracing to a File.
  - For UNIX-based systems, get a stack traceback for the application. Stack tracebacks provide information on the system calls for each process ID up to the point of the suspension. For more information, see Gathering Stack Traceback Information on UNIX-Based Systems.
    - For AIX, issue kill -36 against the db2sysc process and all DB2 processes owned by the instance and the partition server.
    - For HP-UX, issue kill -29 against the db2sysc and all DB2 processes owned by the instance and the partition server.
    - For Solaris Operating Environment, issue kill -21 against the db2sysc and all DB2 processes owned by the instance and the partition server.
If you determine that the problem is not with DB2 but with a vendor-supplied application, contact the vendor.

[ Top of Page | Previous Page | Next Page ]