Troubleshooting Guide
If you are supporting an organization using DB2, you will receive calls
from users to resolve a variety of problems. Your response depends
on:
- The severity of the problem
- The specific nature of the problem
- Any related information that you can gather
- Your experience in resolving similar problems
To solve a problem, start by obtaining a comprehensive description of the
problem. This way, you can begin to determine its origin. For
example, a problem may exist in any of the following:
- Hardware
- Operating system
- Networking system or other subsystem
- DB2 server
- DB2 client
- DB2 Connect gateway to host systems
Most applications run in a client/server environment. You must
determine if a problem is on the client, the server, or somewhere in between
(that is, in the LAN or communication protocol stack).
Investigating where the problem is detected or reported is the best way to
start. For example, if you receive an unexpected SQLCODE on a client,
then investigate the SQLCODE on that client. (See Responding to Unexpected Messages or SQLCODEs for information.)
Often the SQLCODE alone provides enough information to determine the source
and cause of the problem. If the SQLCODE does not give enough
information to determine the source of a problem, examine the
db2diag.log file at the partition server where the problem
was reported. For example, if the problem was reported on a client,
first look at the db2diag.log file on that particular
client.
The db2diag.log file is an ASCII file written by DB2 that
contains diagnostic information for DB2. The
db2diag.log file reports exceptions encountered in DB2
code. If you know the date and time when the problem occurred, you can
go directly to the corresponding db2diag.log file
entries.
Note: | The error messages relating to a user application problem most often do not
cause exceptions in DB2. This type of problem is handled as a normal
part of DB2 processing. As a result, they are not reported in the
db2diag.log file.
|
For information on this important file, see First Failure Data Capture. When viewing the file, keep in mind that the most
recent conditions are always at the end.
When you receive an unexpected message or SQLCODE, follow these steps until
you can determine the problem:
- When you receive a message, take note of all available information,
including the following:
- The code, an eight-digit alphanumeric message identification
number. This code may begin with the prefix SQL, DBA or CCA.
Also note all reason codes, return codes, and other information associated
with the message returned.
-
Any SQLSTATE received. SQLSTATEs are useful for diagnosing problems,
because they are consistent across all platforms. For a list of
SQLSTATEs, see the Message Reference.
- The text of the message (especially if the message does not include an
identification number or a code).
- The SQLCA if available.
- Any action suggested in the message.
- Diagnostic files, such as the db2diag.log file.
In addition, note any operating system diagnostic files such as traceback
files, core files (for UNIX-based systems), event logs (for Windows NT),
syslog files (for OS/2), and any dumps. See Part 2, Advanced DB2 Troubleshooting.
- The environment in which the message occurred. For example, what
the user was doing at the time, the steps that led up to the problem, the type
of operating system, applications that were running, and the communication
protocol. Also note if any utilities such as RUNSTATS,
REORG, LOAD, IMPORT, or others were being
used.
- The SQL statement that encountered the error, and any preceding statements
in the unit of work.
- Check the online message help by typing db2 "?
message" from the command prompt, where message is the
complete SQLCODE, SQLSTATE, or message number. Read and follow the
suggested actions.
- Use the SQLCODE or message number to search available DB2 documentation
for additional information.
- If the problem persists, ensure that you have as much of the following
information as possible before contacting DB2 Customer Support:
- If you determine that the problem is not with DB2 but with a
vendor-supplied application, contact the vendor.
In this book the term abend includes:
- Segmentation violations and general protection faults (GPFs) on Windows
systems
- Traps on OS/2
- Exceptions on UNIX-based systems
When an abend occurs, work through the following steps until you can
determine the problem:
- Confirm that all DB2 components (clients, servers, DB2 Connect, each
partition server in an Enterprise - Extended Edition system) are at the same
service level, especially if a fix pack has recently been installed.
See Updating DB2 Products.
- Note the executable module that reported the abend.
- If the problem persists, try to collect the following additional
information before contacting DB2 Customer Support:
- Any logged information, in particular:
- If the problem can be reproduced and you are willing to abend the system
again, a trace on the client and server may be helpful. Follow the
steps in Example of Tracing to a File.
See Part 2, Advanced DB2 Troubleshooting for information.
- If you determine that the problem is not with DB2 but with a
vendor-supplied application, contact the vendor.
When the system appears to be suspended or looping, try to identify the
problem by working through the following steps:
- Using messages, the db2diag.log file, and other
information, attempt to determine why the suspension or loop occurred.
Some common problems that cause suspensions or loops can include the
following:
- Recover the system:
- If the operating system is suspended (with no sign of disk activity),
reboot the machine and check the db2diag.log file for
problems. Also, check the hardware operating system reports; for
example, on AIX, use errpt -a.
- If you can access the operating system but not the application:
- Check the status of applications with the Control Center or the LIST
APPLICATIONS FOR DATABASE database-alias command.
The status information shows if applications are running (UOW
Executing), waiting for a lock (Lock Wait), or for user input
(UOW Waiting), rather than being suspended inside of the database
manager.
- Use a CPU monitor to check for applications that are using large amounts
of CPU time, and then use your judgment to determine whether or not the
applications are suspended or behaving as expected.
- Check for disk activity using an operating system command such as
iostat on AIX.
- Check the db2diag.log file for DB2 problems.
- On UNIX-based environments, work through the following steps until
you can stop your DB2 instance:
- Stop the DB2 instance normally with db2stop
- Stop the DB2 instance and force any remaining applications with
db2stop force
There may be a time when you may not be able to stop the DB2 instance using
the methods suggested above. Before performing any of the following
steps collect as much information as possible. This may
include:
- Snapshots from each of the partition servers.
- Use ps -ef on each partition server and save the
results.
- Use ipcs on each partition server and save the results.
- Use db2_call_stack and save the results.
After carrying out the above steps, work through the following
steps:
- Abruptly kill the DB2 instance with db2stop -kill In an DB2 UDB
Enterprise - Extended Edition (EEE) system, you can use
db2_kill. Use db2kill in a single partition
server environment.
- Use the kill command to terminate any DB2 agents that cannot be
stopped
- Use the kill command to terminate DB2 by killing the db2syscs
process.
- As a very last resort, reboot your entire system
If you must use the kill command, ensure that all DB2
interprocess communications (IPC) resources are removed. Either:
- If the problem persists, try to collect the following additional
information before contacting DB2 Customer Support:
- Any information logged by DB2. See "First Failure Data Capture".
- If the system is suspended:
- Set up a DB2 trace to dump output to a file. Follow the
instructions in Example of Tracing to a File.
- For UNIX-based systems, get a stack traceback for the
application. Stack tracebacks provide information on the system calls
for each process ID up to the point of the suspension. For more
information, see Gathering Stack Traceback Information on UNIX-Based Systems.
- For AIX, issue kill -36 against the db2sysc process and
all DB2 processes owned by the instance and the partition server.
- For HP-UX, issue kill -29 against the db2sysc and all
DB2 processes owned by the instance and the partition server.
- For Solaris Operating Environment, issue kill -21
against the db2sysc and all DB2 processes owned by the instance and the
partition server.
- If you determine that the problem is not with DB2 but with a
vendor-supplied application, contact the vendor.
[ Top of Page | Previous Page | Next Page ]