Troubleshooting Guide

Interpreting the db2diag.log

The db2diag.log is an ASCII file that contains information logged by DB2. It is located in the directory specified by the DIAGPATH database manager configuration parameter. Use a text editor to view the file on the machine where you suspect a problem to have occurred.

The information in the db2diag.log includes the following:

The location reporting an error. Application identifiers allow you to match up entries pertaining to an application on the db2diag.logs of servers and clients.
A diagnostic message (beginning with "DIA") explaining the reason for the error.
Any available supporting data, such as SQLCA data structures and pointers to the location of any extra dump or trap files.

There are two types of entries in the db2diag.log:

Administrative events. These entries are valuable, since they indicate whether events such as backups and restores started and finished.
Error information. This information is only useful if you are trying to diagnose an external symptom, or if you have already determined what the error is and are looking for more information. For example, when an application receives an unexpected SQL code, or when a database crashes, the file can contain error information, including pointers to dump files.

If the database is behaving normally, this type of information is not important and can be ignored.

Notes:

Check the end of the file for the most recent data, because new information is always appended to the bottom of the file.
Entries always have a timestamp. If you know when an error occurred, look for the first entry in the file marked with this time.
The db2diag.log grows continuously. When it gets too large, back it up and then erase the file. A new db2diag.log is generated automatically the next time it is required by the system.

Interpreting db2diag.log Entry Headers

The following example shows the header information for a sample db2diag.log entry.

2000-03-06-11.53.18.001160 (1) Instance:payroll (2) Node:000 (3) PID:44829(db2agent (SAMPLE))(4) TID:352(5) Appid:*LOCAL.payroll.000306140834(6) lock_manager (7) sqlplrq (8) Probe:111 (9) Database:SAMPLE (10) DIA9999E (11) An internal return code occurred. Report the following: "0xFFFFE10E". (12)

Legend:

(1)

A timestamp for the message.

(2)

The name of the instance generating the message.

(3)

For DB2 Enterprise - Extended Edition systems with a db2nodes.cfg file, the node generating the message. (If the db2nodes.cfg file is not used, the value is "000".)

(4)

Identification of the process generating the message. In this example, the message came from the process identified as 44829. The name of this process is db2agent and it is connected to the database named SAMPLE.

Note: If the application is operating in a DUOW environment, the ID shown is the DUOW correlation token.

(5)

Identification of the table generating the message. In this example, the message came from the table identified as 352.

(6)

Identification of the application for which the process is working. In this example, the process generating the message is working on behalf of an application with the ID *LOCAL.payroll.970317140834.

To identify more about a particular application ID, either:

Use the db2 list applications command on a DB2 UDB server or db2 list dcs applications on a DB2 UDB Connect gateway to view a list of application IDs. From this list, you can determine information about the client experiencing the error, such as its node name and its TCP/IP address.
Use the db2 get snapshot for application command to view a list of application IDs.

(7)

The DB2 component that is writing the message. For messages written by user applications using the db2AdminMsgWrite API, the component will read "User Application".

(8)

The name of the function that is providing the message. This function operates within the DB2 subcomponent that is writing the message. For messages written by user applications using the db2AdminMsgWrite API, the function will read "User Function".

To find out more about the type of activity performed by a function, look at the fourth letter of its name. In this example, the letter "p" in the function "sqlplrq" indicates a data protection problem. (Logs could be damaged, for example.)

The following list shows some of the letters used in the fourth position of the function name, and the type of activity they identify:

b: Buffer pools
c: Communication between clients and servers
d: Data management
e: Engine processes
o: Operating system calls (such as opening and closing files)
p: Data protection (such as locking and logging)
r: Relational database services
s: Sorting
x: Indexing

(9)

Identification of the internal error that was reported.

(10)

The database on which the error occurred.

(11)

Diagnostic message indicating that an internal error occurred.

(12)

Hexadecimal representation of an internal return code. For more information, see Interpreting Hexadecimal Codes.

Interpreting an SQLCA Structure

For severe errors, an SQLCA structure is dumped into the db2diag.log. For details on each of the SQLCA fields, see Appendix B, SQL Communications (SQLCA).

The following diagram provides an example of a db2diag.log with an SQLCA dump.

1997-03-16-11.53.18.001160 Instance:payroll Node:000 PID:44829(db2agent (SAMPLE)) Appid:*LOCAL.payroll.970317140834 relation_data_serv sqlrerlg Probe:17 Database:SAMPLE DIA9999E An internal return code occurred. Report the following : "0xFFFFE101". Data Title :SQLCA pid(14358) (1) sqlcaid : SQLCA sqlcabc: 136 sqlcode: -980 (2) sqlerrml: 0 sqlerrmc: (3) sqlerrp : sqlrita sqlerrd (4): (1) 0xFFFFE101 (5) (2) 0x00000000 (3) 0x00000000 (4) 0x00000000 (5) 0x00000000 (6) 0x00000000 sqlwarn : (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) sqlstate:

Legend:

(1): Beginning of the SQLCA entry.
(2): The SQL state (when negative, an error has occurred).
(3): Any reason codes associated with the SQL error code.
(4): Sometimes there are several errors leading to the final SQL error code. These errors are shown in sequence in the sqlerrd area.
(5): The hexadecimal representation of an SQL error. See Interpreting Hexadecimal Codes, for more information.

Interpreting Hexadecimal Codes

On OS/2 and Windows systems, some db2diag.log or SQLCA error codes are byte-reversed. If they are in the form ffff nnnn, they can be interpreted as is. If they are in the form nnnn ffff, you must byte-reverse them before they are meaningful.

To do this, switch the first four characters with the last four characters, and then the fifth and sixth characters with the seventh and eighth characters.

For example, the error code "0ae6 ffff" is translated to "ffff e60a".

When you have the error code in a meaningful form:

Convert it to decimal format, using a hexadecimal conversion tool. If you can locate it in the Message Reference, then it is an SQL code.
If the decimal conversion of the error code is not an SQL code, it is a return code. See Appendix A, DB2 Internal Return Codes for a list of return codes.

Some Examples for Interpreting the db2diag.log

The following examples illustrate how you can use the db2diag.log to diagnose problems.

Example 1

Problem: A client application is executing against the database SAMPLE. During execution, the application always terminates abnormally.

Approach: To find the solution to the problem, first find the location of the error. Because the error occurs on the client, start by looking for the error on the client. To do this, look at the db2diag.log on the client machine.

Assume that the db2diag.log on the client machine contains no entries. Look at the db2diag.log on the DB2 server machine to see if an error has occurred on the server.

Assume that the db2diag.log on the server contains the following entries:

1997-03-16-20.52.27.001160 Instance:payroll Node:000 PID:44829(db2sysc (SAMPLE)) Appid:*LOCAL.payroll.970317140834 buffer_pool_services sqlbStartPools Probe:0 Database:SAMPLE Starting the database. (1) 1997-03-16-20.52.42.001160 Instance:payroll Node:000 PID:44829(db2sysc (SAMPLE)) Appid:*LOCAL.payroll.970317140834 buffer_pool_services sqlbcres Probe:1100 Database:SAMPLE DIA3726C A invalid page checksum was found for page "". (2) ZRC=FFFFE119 (3) 1997-03-16-20.52.42.001160 Instance:payroll Node:000 PID:44829(db2sysc (SAMPLE)) Appid:*LOCAL.payroll.970317140834 buffer_pool_services sqlbcres Probe:1100 Database:SAMPLE Obj={pool:2;obj:10;type:0} State=x27 (4) Data Title :SQLB_OBJECT_DESC pid(104) tid(109) 0200 0a00 0200 0a00 0000 0000 0000 0000 ................ 0000 002e e00c 0000 0000 0000 0000 0000 ................ 0000 0000 0100 0000 2700 0000 0000 0000 ................ 0000 0000 0000 0000 ........ Dump File : C:\SQLLIB\DB2\104109.dmp Data : SQLB_PAGE

Interpretation:

(1)

An initial connection is made on the database. From the heading information for this message we see the database is the SAMPLE database.

(2)

A message indicates that there is an invalid page.

(3)

ZRC=FFFFE119 is an internal DB2 return code for the condition described in (2). (Fortunately, there was a message in addition to this code. Example 2 will show how a hexadecimal return code may have to be used to determine an error condition.)

(4)

This information tells which object has been found to be inconsistent.

The pool ID is 2, indicating a table space ID of 2.
The object ID is 10, indicating a table ID of 10. The hexadecimal value is in the SQLB_OBJECT_DESC and is shown as "0a" which is decimal 10.
The object type is 0, indicating a data object. (Object type 1 indicates an index object.)

Solution: Restore and roll forward the database. If this is not possible, contact DB2 Customer Support.

Note: This example has shown that an error with symptoms only on the client machine may be caused by an error occurring on the DB2 server machine.

Example 2

Problem: A DB2 client application receives the SQL1042C error message when connecting to the database SAMPLE.

Approach: To find the solution to the problem, first find the location of the error. Because the error occurs on the client, start by looking at the db2diag.log on the client machine.

Assume the db2diag.log on the client machine contains no entries. Look at the db2diag.log on the DB2 server machine to see if an error has occurred on the server.

Assume the db2diag.log on the server contains the following entries:

1997-03-16-08.59.34.001160 Instance:payroll Node:000 PID:55543(db2syscs (SAMPLE)) Appid:*LOCAL.payroll.970317140834 buffer_pool_services sqlbStartPools Probe:0 Database:SAMPLE Starting the database. (1) 1997-03-16-08.59.35.001160 Instance:payroll Node:000 PID:55543(db2syscs (SAMPLE)) Appid:*LOCAL.payroll.970317140834 data_protection sqlpresr Probe:0 Database:SAMPLE Crash Recovery has been initiated. (2) 1997-03-16-08.59.35.001160 Instance:payroll Node:000 PID:55543(db2syscs (SAMPLE)) Appid:*LOCAL.payroll.970317140834 data_protection sqlpgole Probe:30 Database:SAMPLE A problem occurred while verifying a database log file S0000000.LOG RC=0ae6 ffff (3) 1997-03-16-08.59.35.001160 Instance:payroll Node:000 PID:55543(db2syscs (SAMPLE)) Appid:*LOCAL.payroll.970317140834 data_protection sqlpgilt Probe:101 Database:SAMPLE DiagData 0ae6 ffff 1997-03-16-08.59.36.001160 Instance:payroll Node:000 PID:55543(db2syscs (SAMPLE)) Appid:*LOCAL.payroll.970317140834 data_protection sqlpgilt Probe:60 Database:SAMPLE DiagData 0ae6 ffff 1997-03-16-08.59.36.001160 Instance:payroll Node:000 PID:55543(db2syscs (SAMPLE)) Appid:*LOCAL.payroll.970317140834 data_protection sqlpgasn Probe:915 Database:SAMPLE Marked the database log as bad. 0000 0000 (4)

Interpretation:

(1)

An initial request to connect to database SAMPLE was received. This indicates that the client request could reach the database server and communications are working.

(2)

Database crash recovery started, indicating that the database SAMPLE was not stopped normally the last time it was in use.

(3)

An error occurred with log file S0000000.LOG. The error code is shown as "0ae6 ffff".

Error codes should be in the format ffff nnnn. However, OS/2 and Windows architecture may byte-reverse integers. Because the error code in this example is in the form nnnn ffff, you must byte-reverse it before it is meaningful. For more information, see Interpreting Hexadecimal Codes.

The error code in this example translates to ffff e60a. When translated to decimal form (-6646), it is not a valid SQL code, indicating that it is a return code rather than an SQL code. Looking up return code E60A in Appendix A, DB2 Internal Return Codes, we see that the file does not exist.

Database restart and recovery requires all log files that were in use at the time the database went down. This db2diag.log entry indicates that the log file S0000000.LOG cannot be found in the expected location. Restart and recovery cannot proceed.

(4)

Because restart and recovery could not complete, DB2 marks the database as "bad" to ensure that complete restart and recovery are done before the database is used again.

Solution: The best option to resolve this problem is to restore from a backup. Because the log file is S0000000.LOG (rather than S0001005.LOG or some other value), the database has not been log-retain enabled (or may be new). Therefore, restoring from backup is the only method of recovery.

Example 3

Problem: During a database connection, the SQL1004C message is received. This message indicates that a file system is out of storage. How do we determine which file system is involved?

Approach: First examine the text of the error message itself. (Issue db2 ? sql1004c or see the error in the Message Reference.) The message indicates that there is insufficient storage on a file system to process the command.

Next examine the db2diag.log on the server to find out exactly which file system is full, and to check if there is additional information.

Assume that the db2diag.log on the server contains the following entries:

1997-03-16-08.40.42.001160 Instance:payroll Node:000 PID:66847(db2syscs (SAMPLE)) Appid:*LOCAL.payroll.970317140834 data_protection sqlpgifl Probe:105 Database:SAMPLE DiagData 0cd6 ffff (1) 1997-03-16-08.40.43.001160 Instance:payroll Node:000 PID:66847(db2syscs (SAMPLE)) Appid:*LOCAL.payroll.970317140834 data_protection sqlpgifl (2) Probe:540 Database:SAMPLE 0cd6 ffff 1997-03-16-08.42.55.001160 Instance:payroll Node:000 PID:66847(db2syscs (SAMPLE)) Appid:*LOCAL.payroll.970317140834 data_protection sqlpgifl Probe:105 Database:SAMPLE DiagData 0cd6 ffff

Interpretation:

(1): The function sqlpgifl encounters the error code ffff d60c (remember to byte-reverse any error codes that are in the form nnnn ffff). Converting this code to decimal does not yield a valid SQL code. Looking up return code D60C in Appendix A, DB2 Internal Return Codes, we see that the disk is full.
(2): The sqlpgif1 function repeatedly encounters the error. This function is a logging function (because the fourth character is "p". See Interpreting the db2diag.log, for more information. The repeated failure of this function indicates that we should first examine the log path to check available space.

Solution:

Determine how much space is in the log path by using dir for Windows or OS/2 and df for UNIX-based environments.
Determine how much space may be required for logging. The amount of space (in bytes) required for log files can range from
```
(logprimary * (logfilsiz * 4096) + 8192
```
to
```
((logprimary + logsecond) * (logfilsiz + 2) * 4096) + 8192
```
This calculation shows the range of space that may be required by the logs, assuming log retain is not enabled.
If log retain is enabled, the logs will continue to grow unless user exits are enabled. With log retain, ensure that the log path has as much space available as possible.
If you determine that the file system is full, do one of the following:
- On UNIX-based environments, extend the size of the file system containing the logs.
- Move the logs to another log path that has enough space available to contain the primary and secondary logs.
- Examine your logging parameters to see if their values can be reduced (see the Administration Guide: Implementation).

When the db2diag.log does not contain enough information to solve a problem, you will need to perform a trace. For more information, see Example of Using db2diag.log and the DB2 Trace.

[ Top of Page | Previous Page | Next Page ]