Troubleshooting Guide

Interpreting the db2diag.log

The db2diag.log is an ASCII file that contains information logged by DB2. It is located in the directory specified by the DIAGPATH database manager configuration parameter. Use a text editor to view the file on the machine where you suspect a problem to have occurred.

The information in the db2diag.log includes the following:

There are two types of entries in the db2diag.log:

If the database is behaving normally, this type of information is not important and can be ignored.

Notes:

Interpreting db2diag.log Entry Headers

The following example shows the header information for a sample db2diag.log entry.
2000-03-06-11.53.18.001160 (1)   Instance:payroll (2)  Node:000 (3)
PID:44829(db2agent (SAMPLE))(4) TID:352(5)   
Appid:*LOCAL.payroll.000306140834(6)
lock_manager (7)         sqlplrq (8)   Probe:111 (9)  Database:SAMPLE (10)
DIA9999E (11) An internal return code occurred. Report the following:
"0xFFFFE10E". (12)

Legend:

(1)
A timestamp for the message.

(2)
The name of the instance generating the message.

(3)
For DB2 Enterprise - Extended Edition systems with a db2nodes.cfg file, the node generating the message. (If the db2nodes.cfg file is not used, the value is "000".)

(4)
Identification of the process generating the message. In this example, the message came from the process identified as 44829. The name of this process is db2agent and it is connected to the database named SAMPLE.

Note: If the application is operating in a DUOW environment, the ID shown is the DUOW correlation token.

(5)
Identification of the table generating the message. In this example, the message came from the table identified as 352.

(6)
Identification of the application for which the process is working. In this example, the process generating the message is working on behalf of an application with the ID *LOCAL.payroll.970317140834.

To identify more about a particular application ID, either:

(7)
The DB2 component that is writing the message. For messages written by user applications using the db2AdminMsgWrite API, the component will read "User Application".

(8)
The name of the function that is providing the message. This function operates within the DB2 subcomponent that is writing the message. For messages written by user applications using the db2AdminMsgWrite API, the function will read "User Function".

To find out more about the type of activity performed by a function, look at the fourth letter of its name. In this example, the letter "p" in the function "sqlplrq" indicates a data protection problem. (Logs could be damaged, for example.)

The following list shows some of the letters used in the fourth position of the function name, and the type of activity they identify:

b
Buffer pools

c
Communication between clients and servers

d
Data management

e
Engine processes

o
Operating system calls (such as opening and closing files)

p
Data protection (such as locking and logging)

r
Relational database services

s
Sorting

x
Indexing

(9)
Identification of the internal error that was reported.

(10)
The database on which the error occurred.

(11)
Diagnostic message indicating that an internal error occurred.

(12)
Hexadecimal representation of an internal return code. For more information, see Interpreting Hexadecimal Codes.

Interpreting an SQLCA Structure

For severe errors, an SQLCA structure is dumped into the db2diag.log. For details on each of the SQLCA fields, see Appendix B, SQL Communications (SQLCA).

The following diagram provides an example of a db2diag.log with an SQLCA dump.
1997-03-16-11.53.18.001160    Instance:payroll   Node:000
PID:44829(db2agent (SAMPLE))     Appid:*LOCAL.payroll.970317140834
relation_data_serv          sqlrerlg    Probe:17   Database:SAMPLE
DIA9999E An internal return code occurred. Report the following : "0xFFFFE101".
Data Title :SQLCA pid(14358) (1)
sqlcaid : SQLCA    sqlcabc: 136    sqlcode: -980 (2)  sqlerrml: 0
sqlerrmc: (3)
sqlerrp : sqlrita
sqlerrd (4): (1) 0xFFFFE101  (5)  (2) 0x00000000    (3) 0x00000000
          (4) 0x00000000    (5) 0x00000000    (6) 0x00000000
sqlwarn : (1)         (2)       (3)       (4)      (5)       (6)
          (7)         (8)       (9)      (10)     (11)
sqlstate:     

Legend:

(1)
Beginning of the SQLCA entry.

(2)
The SQL state (when negative, an error has occurred).

(3)
Any reason codes associated with the SQL error code.

(4)
Sometimes there are several errors leading to the final SQL error code. These errors are shown in sequence in the sqlerrd area.

(5)
The hexadecimal representation of an SQL error. See Interpreting Hexadecimal Codes, for more information.

Interpreting Hexadecimal Codes

On OS/2 and Windows systems, some db2diag.log or SQLCA error codes are byte-reversed. If they are in the form ffff nnnn, they can be interpreted as is. If they are in the form nnnn ffff, you must byte-reverse them before they are meaningful.

To do this, switch the first four characters with the last four characters, and then the fifth and sixth characters with the seventh and eighth characters.

For example, the error code "0ae6 ffff" is translated to "ffff e60a".

When you have the error code in a meaningful form:

  1. Convert it to decimal format, using a hexadecimal conversion tool. If you can locate it in the Message Reference, then it is an SQL code.
  2. If the decimal conversion of the error code is not an SQL code, it is a return code. See Appendix A, DB2 Internal Return Codes for a list of return codes.

Some Examples for Interpreting the db2diag.log

The following examples illustrate how you can use the db2diag.log to diagnose problems.

Example 1

Problem: A client application is executing against the database SAMPLE. During execution, the application always terminates abnormally.

Approach: To find the solution to the problem, first find the location of the error. Because the error occurs on the client, start by looking for the error on the client. To do this, look at the db2diag.log on the client machine.

Assume that the db2diag.log on the client machine contains no entries. Look at the db2diag.log on the DB2 server machine to see if an error has occurred on the server.

Assume that the db2diag.log on the server contains the following entries:


1997-03-16-20.52.27.001160    Instance:payroll   Node:000
PID:44829(db2sysc (SAMPLE))     Appid:*LOCAL.payroll.970317140834
buffer_pool_services          sqlbStartPools    Probe:0   Database:SAMPLE
Starting the database. (1)
1997-03-16-20.52.42.001160    Instance:payroll   Node:000
PID:44829(db2sysc (SAMPLE))     Appid:*LOCAL.payroll.970317140834
buffer_pool_services          sqlbcres    Probe:1100   Database:SAMPLE
DIA3726C A invalid page checksum was found for page "". (2)
ZRC=FFFFE119 (3)
1997-03-16-20.52.42.001160    Instance:payroll   Node:000
PID:44829(db2sysc (SAMPLE))     Appid:*LOCAL.payroll.970317140834
buffer_pool_services          sqlbcres    Probe:1100   Database:SAMPLE
 Obj={pool:2;obj:10;type:0} State=x27 (4)
Data Title :SQLB_OBJECT_DESC pid(104) tid(109)
0200 0a00 0200 0a00 0000 0000 0000 0000       ................
0000 002e e00c 0000 0000 0000 0000 0000       ................
0000 0000 0100 0000 2700 0000 0000 0000       ................
0000 0000 0000 0000                           ........
Dump File : C:\SQLLIB\DB2\104109.dmp Data : SQLB_PAGE

Interpretation:

(1)
An initial connection is made on the database. From the heading information for this message we see the database is the SAMPLE database.

(2)
A message indicates that there is an invalid page.

(3)
ZRC=FFFFE119 is an internal DB2 return code for the condition described in (2). (Fortunately, there was a message in addition to this code. Example 2 will show how a hexadecimal return code may have to be used to determine an error condition.)

(4)
This information tells which object has been found to be inconsistent.

Solution: Restore and roll forward the database. If this is not possible, contact DB2 Customer Support.

Note: This example has shown that an error with symptoms only on the client machine may be caused by an error occurring on the DB2 server machine.

Example 2

Problem: A DB2 client application receives the SQL1042C error message when connecting to the database SAMPLE.

Approach: To find the solution to the problem, first find the location of the error. Because the error occurs on the client, start by looking at the db2diag.log on the client machine.

Assume the db2diag.log on the client machine contains no entries. Look at the db2diag.log on the DB2 server machine to see if an error has occurred on the server.

Assume the db2diag.log on the server contains the following entries:
1997-03-16-08.59.34.001160    Instance:payroll   Node:000
PID:55543(db2syscs (SAMPLE))     Appid:*LOCAL.payroll.970317140834
buffer_pool_services    sqlbStartPools    Probe:0   Database:SAMPLE
Starting the database. (1)
1997-03-16-08.59.35.001160    Instance:payroll   Node:000
PID:55543(db2syscs (SAMPLE))     Appid:*LOCAL.payroll.970317140834
data_protection       sqlpresr    Probe:0   Database:SAMPLE
Crash Recovery has been initiated. (2)
1997-03-16-08.59.35.001160    Instance:payroll   Node:000
PID:55543(db2syscs (SAMPLE))     Appid:*LOCAL.payroll.970317140834
data_protection       sqlpgole    Probe:30   Database:SAMPLE
A problem occurred while verifying a database log file S0000000.LOG
RC=0ae6 ffff (3)
1997-03-16-08.59.35.001160    Instance:payroll   Node:000
PID:55543(db2syscs (SAMPLE))     Appid:*LOCAL.payroll.970317140834
data_protection       sqlpgilt   Probe:101  Database:SAMPLE
DiagData
0ae6 ffff
1997-03-16-08.59.36.001160    Instance:payroll   Node:000
PID:55543(db2syscs (SAMPLE))     Appid:*LOCAL.payroll.970317140834
data_protection       sqlpgilt   Probe:60  Database:SAMPLE
DiagData
0ae6 ffff
1997-03-16-08.59.36.001160    Instance:payroll   Node:000
PID:55543(db2syscs (SAMPLE))     Appid:*LOCAL.payroll.970317140834
data_protection       sqlpgasn   Probe:915  Database:SAMPLE
Marked the database log as bad. 0000 0000  (4)

Interpretation:

(1)
An initial request to connect to database SAMPLE was received. This indicates that the client request could reach the database server and communications are working.

(2)
Database crash recovery started, indicating that the database SAMPLE was not stopped normally the last time it was in use.

(3)
An error occurred with log file S0000000.LOG. The error code is shown as "0ae6 ffff".

Error codes should be in the format ffff nnnn. However, OS/2 and Windows architecture may byte-reverse integers. Because the error code in this example is in the form nnnn ffff, you must byte-reverse it before it is meaningful. For more information, see Interpreting Hexadecimal Codes.

The error code in this example translates to ffff e60a. When translated to decimal form (-6646), it is not a valid SQL code, indicating that it is a return code rather than an SQL code. Looking up return code E60A in Appendix A, DB2 Internal Return Codes, we see that the file does not exist.

Database restart and recovery requires all log files that were in use at the time the database went down. This db2diag.log entry indicates that the log file S0000000.LOG cannot be found in the expected location. Restart and recovery cannot proceed.

(4)
Because restart and recovery could not complete, DB2 marks the database as "bad" to ensure that complete restart and recovery are done before the database is used again.

Solution: The best option to resolve this problem is to restore from a backup. Because the log file is S0000000.LOG (rather than S0001005.LOG or some other value), the database has not been log-retain enabled (or may be new). Therefore, restoring from backup is the only method of recovery.

Example 3

Problem: During a database connection, the SQL1004C message is received. This message indicates that a file system is out of storage. How do we determine which file system is involved?

Approach: First examine the text of the error message itself. (Issue db2 ? sql1004c or see the error in the Message Reference.) The message indicates that there is insufficient storage on a file system to process the command.

Next examine the db2diag.log on the server to find out exactly which file system is full, and to check if there is additional information.

Assume that the db2diag.log on the server contains the following entries:
1997-03-16-08.40.42.001160    Instance:payroll   Node:000
PID:66847(db2syscs (SAMPLE))     Appid:*LOCAL.payroll.970317140834
data_protection       sqlpgifl    Probe:105   Database:SAMPLE
DiagData
0cd6 ffff (1)
1997-03-16-08.40.43.001160    Instance:payroll   Node:000
PID:66847(db2syscs (SAMPLE))     Appid:*LOCAL.payroll.970317140834
data_protection       sqlpgifl (2)   Probe:540   Database:SAMPLE
0cd6 ffff
1997-03-16-08.42.55.001160    Instance:payroll   Node:000
PID:66847(db2syscs (SAMPLE))     Appid:*LOCAL.payroll.970317140834
data_protection       sqlpgifl    Probe:105   Database:SAMPLE
DiagData
0cd6 ffff 

Interpretation:

(1)
The function sqlpgifl encounters the error code ffff d60c (remember to byte-reverse any error codes that are in the form nnnn ffff). Converting this code to decimal does not yield a valid SQL code. Looking up return code D60C in Appendix A, DB2 Internal Return Codes, we see that the disk is full.

(2)
The sqlpgif1 function repeatedly encounters the error. This function is a logging function (because the fourth character is "p". See Interpreting the db2diag.log, for more information. The repeated failure of this function indicates that we should first examine the log path to check available space.

Solution:

  1. Determine how much space is in the log path by using dir for Windows or OS/2 and df for UNIX-based environments.
  2. Determine how much space may be required for logging. The amount of space (in bytes) required for log files can range from
    (logprimary * (logfilsiz * 4096) + 8192
    

    to

    ((logprimary + logsecond) * (logfilsiz + 2) * 4096) + 8192
    

    This calculation shows the range of space that may be required by the logs, assuming log retain is not enabled.

    If log retain is enabled, the logs will continue to grow unless user exits are enabled. With log retain, ensure that the log path has as much space available as possible.

  3. If you determine that the file system is full, do one of the following:

When the db2diag.log does not contain enough information to solve a problem, you will need to perform a trace. For more information, see Example of Using db2diag.log and the DB2 Trace.


[ Top of Page | Previous Page | Next Page ]