IBM Books

Troubleshooting Guide


Troubleshooting Tips

This section describes how to deal with some frequently encountered problems faced by users working with a DB2 server. The following topics are presented in this section:

For additional troubleshooting information on this topic, see:

Important: This section represents a small sampling of the information available from DB2 Customer Service. For a complete and up-to-date source of DB2 information, use the DB2 Product and Service Technical Library at http://www.software.ibm.com/data/db2/library/.

Problems Migrating to Version 6

If you experience problems migrating to Version 6, ensure that:

[  ]
You followed the instructions in the Quick Beginnings guide.

[  ]
You checked the list of incompatibilities from previous versions. (See the Administration Guide.)

[  ]
The database is consistent.

To verify that databases are consistent so that they can be migrated, use the db2ckmig command. (Refer to the Quick Beginnings book for more details.)

You must correct all errors that are reported for databases that are in any of the following states:

Additionally, this tool identifies databases that contain objects that use SYSCAT, SYSSTAT, or SYSFUN as schema names. These objects must be dropped and re-created using a different schema name.

[  ]
The disk space on the machine is adequate. (The amount of disk space required varies with each database.)

[  ]
You use appropriate values for the number of primary and secondary log files (logprimary and logsecond in database configuration), and for the size of the log files (logfilsiz in database configuration). See Server Failures for details.

Server Failures

Symptom
The DB2 server does not respond to requests or provides unexpected responses.

Action
Ensure that:

[  ]
You check the db2diag.log file on the server to diagnose a cause. For information, see Interpreting the db2diag.log file.

[  ]
DB2COMM is set to the protocols required by connecting clients. (If not, the message SQL5043N is usually received.)

Consult the db2diag.log file to see which communication protocols were not started successfully, and why. For information, see Using the db2diag.log file to diagnose server communication problems.

[  ]
There is enough log space. (If not, the SQL0984C message is typically received.)

To increase log space:

  • Increase the number of primary and secondary log files (logprimary and logsecond in database configuration)
  • Increase the size of the log files (logfilsiz in database configuration)

To initiate your changes, you must disconnect all applications from the database and/or deactivate the database.

You must ensure that the file system or disk space that contains the logs is big enough to hold all of the logs. To calculate the minimum required space for initializing the logs, use the following formula:

(logprimary + logsecondary) * logfilsiz * 4096

[  ]
You are using the latest version of DB2, with appropriate fix packs installed. For information, see Updating DB2 Products.

[  ]
The index files are valid.

The db2diag.log file indicates if indexes need to be re-created. Use db2dart /MI to mark the indexes as inconsistent. See Miscellaneous Troubleshooting Tools for details on db2dart /MI.

To delay the rebuilding of indexes until after the database is started, use the INDEXREC configuration parameter as documented in the Administration Guide, Performance manual.

[  ]
The data is consistent.

Look for the following symptoms:

  • There is a DIA3700C error in the db2diag.log file.
  • The server goes down when particular data is accessed. This data resource may be damaged.
  • The server fails repeatedly.

Run db2dart to verify that the data is consistent. If data is not consistent, contact DB2 Customer Service.

For DB2 Universal Database Enterprise - Extended Edition for the Solaris Operating Environment and DB2 Universal Database Enterprise - Extended Edition for Windows NT, see also Database Manager Will Not Start or Hangs.

SQL1403N received when authenticating usernames or groups

Symptom
You receive the error message SQL1403N when attempting to get authorization for usernames or groups on a Windows NT Master Domain running DB2.

Possible Cause
The authorization restrictions on the Windows NT Master Domain.

Action
Ensure the service for the instance is being started under a local administrator who is not a user from the master domain; and who has operating system rights on the local machine.

Administration Difficulties

Symptom
You have problems installing or administering a database server.

Action
Ensure that:

[  ]
You are using a valid user name and password that have administrator authority.
  • For OS/2 Warp Connect and later versions of OS/2: The default DB2 administrator user name and password are the same as those specified when the operating system was installed.
  • For versions of OS/2 before OS/2 Warp Connect: The default administrator user name and password depend on the national language installed on a machine, and on whether UPM was installed with a particular user name. In most cases, the defaults are USERID and PASSWORD in English countries. For more information, refer to the Quick Beginnings for your platform.
  • For Windows NT: Your membership in a group controls which actions you are permitted to perform. For DB2 for Windows NT, you must belong to the "Administrators" or "Domain Admins." groups to perform installation and some administration tasks.

    You can find out what your group is by using the User Manager tool to display user names and group memberships. To invoke this tool, select Start, then Programs, then Administrative Tools, and then User Manager.

    To install DB2, you must have administrator authority on the local machine with a user name that is eight characters or less (note that you cannot use the user name "Administrator"). To be added to the administrator group, you must be added by an existing member of that group.

  • For UNIX-based environments: You must belong to a SYSADM group. The name of the group must be eight characters or less in length.
  • For DB2 for Solaris: If a DB2 instance is started by a user who does not belong to the same group as the DB2 instance owner, the SQL1042C message is received for many commands. In this situation, the executing db2sysc process inherits the user's group name and does not have the correct read permission to open a file in the /proc directory.

    You must start the instance as the instance owner, and the id/group of the db2sysc process attempting to read a file must be the same as the id/group of the DB2 instance.

[  ]
For UNIX-based environments other than AIX, you update and rebuild the kernel configuration parameters. (If not, the SQL1016N or SQL1018N messages will typically occur when you try to create an instance or use the command line processor.) Refer to the Quick Beginnings guide for your platform for details.

Backing Up and Restoring Data

It is important to have a good backup and recovery strategy, and a plan to implement this strategy. This plan should address questions such as the following:

Backup plans are heavily influenced by the importance of having the data in the database available for use. If your business depends heavily on the data, then you must develop a backup and recovery plan that will minimize downtime and ensure that the data is available when needed at a cost that you are willing to accept.

For help on backup and recovery refer to the Data Movement Utilities Guide and Reference and use the Backup Database SmartGuide from the Control Center.

SQL0902C received when backing up a database or accessing DB2 data

Symptom
The SQL0902C message is received when a database is backed up.

Possible Cause
The database is not architecturally consistent.

Action
Use the db2dart command to verify the architectural consistency of the database. See Miscellaneous Troubleshooting Tools for information on this command.

Usually you can perform a restore with a roll forward to the end of the logs to correct the problem. If the problem persists, contact DB2 Customer Service.

ROLLFORWARD TABLESPACE fails with SQL4908N message

Symptom
The error message SQL4908N is received when running a ROLLFORWARD TABLESPACE.

Possible Cause
If starting the first attempt to roll forward a table space, the message may be received when one or more of the table spaces specified to be rolled forward is not in a ROLLFORWARD PENDING state on the specified nodes. If continuing to roll forward a table space that is already in progress, the message may be received when one or more of the table spaces specified to be rolled forward is not in a "roll forward in progress" state on the specified nodes.

Action
  1. Use the LIST TABLESPACES SHOW DETAIL command on the nodes specified to find out which table spaces are not ready to be rolled forward.
  2. Use the QUERY STATUS option of the ROLLFORWARD command to determine the status of the table space roll forward.
    • If starting a new table space roll forward, put the table spaces into ROLLFORWARD PENDING state by restoring them.
    • If continuing a table space rollforward and one or more of the table spaces involved has been restored and put into ROLLFORWARD PENDING state, the table space roll forward in progress must be cancelled.
      • Put the table spaces into the RESTORE PENDING state.
      • Restore the table spaces.
  3. Submit the original ROLLFORWARD command again.

Backup to disk fails with SQL2419N

Symptom
The error message SQL2419N is received when taking an online backup to disk.

Possible Cause
Although the disk was not full, the backup file was greater than the 2 GB operating system file size limit.

Action
You can do a backup and specify multiple target directories. The backup directories divide the file into sizes smaller than the 2 GB operating system file size limit.

NetBIOS connections from DB2 clients are lost when backing up using ADSM

Symptom
NetBIOS connections from DB2 clients are lost when backing up using ADSTAR Distributed Storage Manager (ADSM).

Possible Cause
As part of ADSM's initialization, a NCB.RESET is issued, which resets the NetBIOS resources of the calling application. In the case of DB2 for OS/2 and Windows NT, the calling application is DB2 itself. This results in all connections to the database made through NetBIOS to be dropped.

Action
There is no way to prevent this from happening when using NetBIOS as the communication protocol. Use another protocol like TCP/IP when using ADSM to backup the database.

Problems restoring a database or table space backup

Symptom
Data cannot be restored.

Action
Ensure that:

[  ]
You have enough disk space to restore the data.

Use redirected restore to specify a modified list of containers for the table spaces being restored. Refer to the Administration Guide.

[  ]
You specify the correct path to the backup image and logs. (It is possible that they were moved.)

[  ]
If an online backup was taken, all logs from the beginning to the end of the backup are required to restore and roll forward the database.

(The rollforward must proceed until at least the point where the online backup ended. Otherwise, the database is not accessible.)

RESTORE using an API fails with SQL0902C return code 22

Symptom
On AIX, the error message SQL0902C with return code 22 is received when restoring a database using an API.

Possible Cause
Abnormal termination of applications has left the message queue connected to an existing database file.

Action
Use the ipcrm AIX command to terminate all back-end processes and then drop the existing database. Retry the RESTORE again. Contact DB2 Support if the problem persists.

After a RESTORE, some table spaces are in a ROLLFORWARD PENDING state

Symptom
After issuing a RESTORE command specifying that no roll forward is required on a Windows NT machine, the table spaces are left in a ROLLFOWARD PENDING state.

Possible Cause
Windows NT is having a resource conflict with DB2 where the hard file cache limit of 192 MB is reached. This hard file cache is used whenever DB2 opens or closes many files, or opens or closes large files.

Action
Ensure the DB2 product is updated to the latest FixPak, and that the DB2NTNOCACHE registry variable is set to one. The DB2NTNOCACHE setting can be checked using the db2set -all command.

SQL1277N is received when attempting a redirected restore

Symptom
The error message SQL1277N is received during a redirected restore to a new database.

Possible Cause
One or more containers used in the redirected restore may already be in use. Check the db2diag.log for additional information.

Action
Remove untag the containers in use and then perform the redirected restore using the Control Center. If you prefer to use the command line processor, you must use the SET TABLESPACE CONTAINERS API and then re-issue the RESTORE command with the "CONTINUE" parameter.

To find out more about the SET TABLESPACE CONTAINERS API, type db2 ? set tablespace at the command line.

Data Not Loading

SQL3508N returned during a LOAD operation

Symptom
The error message SQL3508N is received during a LOAD operation.

Possible Causes
  1. Issuing the LOAD command without explicitly specifying the "REMOTE FILE" parameter before attempting a LOAD RESTART operation causes the utility to overwrite the default remote file.
  2. The specified file names used are similar even though the specified paths are different. For example: "MESSAGES C:\table.MSG" and "REMOTE FILE D:\table".

Action
  1. Restore the database.
  2. Specify different file names for each of the "MESSAGES" and "REMOTE FILE" parameters. For example: "MESSAGES C:\table.MSG" and "REMOTE FILE D:\table.RMT". The "RMT" extension solves the problem.

General problems with LOAD

Symptom
Data will not load on a database server.

Action
Ensure that:

[  ]
You used the RESTART or REPLACE parameters with the LOAD command. (If not, the SQL3805N message is typically received.)

For more information on the LOAD command, refer to the Data Movement Utilities Guide and Reference.

[  ]
The size of a database object in an SMS table space has not reached an operating system limit.

For example, OS/2, Windows NT, and Version 3.2 of AIX have a 2 GB file size limit.

(Refer to Problems with LOAD and with Splitting and Loading Data for more information on loading data in DB2 Universal Database Enterprise - Extended Edition.)

Difficulties Importing Data

Symptom
You are having problems importing data.

Action

[  ]
Ensure that there is enough log space available. (See Server Failures.)

[  ]
Use the COMMITCOUNT n option to commit data after every n records are imported. This option protects the committed data from loss in case of a failure. It also reduces the log requirements for an import operation, allowing a number of smaller transactions rather than one long transaction.

Problems Using Commands, Utilities, or the Command Line Processor

Symptom
You cannot use commands, utilities, or the Command Line Processor.

Action
Ensure that:

[  ]
You bound the utilities and your applications to the database after you installed a fix or newer executable. (The SQL0818N or SQL0805N messages indicate that you need to rebind.)

When you precompile an SQL application, a compilable file is produced, and optionally a bind file. Both of these files have a new timestamp. The default behavior of the precompile operation is to automatically create a package so that you do not need to bind anything. However, if you create a bind file but not a package, then you must bind the new bind file to the database.

For information on binding, see your Quick Beginnings guide.

[  ]
You used the correct syntax (especially for UNIX-based systems).

It is recommended that you either:

  • Use double quotation marks (" ") around a command line processor request.
  • Precede any special characters such as asterisks (*), brackets, or question marks (?) with a back slash (\) character to ensure that the command line processor interprets them correctly.

Example

For:

db2 SELECT * FROM SYSCAT.TABLES

Use either:

db2 "SELECT * FROM SYSCAT.TABLES"
db2 SELECT \* FROM SYSCAT.TABLES

Tables or Rows Locked

The database manager provides concurrency control and prevents uncontrolled access by means of locks. Because of the basic principles of DB2 locking, you do not need to take action to control locks in most cases. For information on how locking works, refer to the Administration Guide.

Diagnosing locking problems

Use the following methods to investigate whether or not applications are creating deadlocks or holding locks:

For information, see The Database System Monitor and the System Monitor Guide and Reference.

SQL0911N received

Symptom
The SQL0911N message is received.

Possible Cause
There are deadlocks or time-outs.

Action
Check the reason code in the SQLCA to determine if deadlocks or time-outs are causing the problem (see Interpreting an SQLCA Structure).

In the case of deadlocks, create an event monitor for deadlocks to gather more information. Possibly increase the size of the lock list to avoid lock escalation which can often result in a deadlock scenario.

Note that time-outs are not captured using the deadlock event monitor.

Applications slow or appear to hang

Symptom
Applications accessing a DB2 database are slow or appear to hang.

Possible Cause
There is lock contention or lock escalation.

Action
Use the Control Center or the LIST APPLICATIONS FOR DATABASE database-alias SHOW DETAIL command to find out if applications are waiting on a lock and who is holding the lock that the applications are waiting for. Then use the database monitor to determine if lock escalation is occurring.

If escalation is occurring, ensure that:

[  ]
Applications use an appropriate frequency of commits.

Take a monitor snapshot for locks to determine which applications are causing other applications to wait for locks. Also, monitor the db2.lock_waits performance variable.

[  ]
The values of the database configuration parameters maxlocks and locklist, which determine the number of locks allowed, are appropriate.

[  ]
The locktimeout database configuration parameter is set appropriately. When it is turned on, this parameter can help avoid hangs due to lock contention, and can tell you if lock contention is your problem.

[  ]
The dlchktime database configuration parameter is set appropriately. This parameter defines the frequency at which the database manager checks for deadlocks among all the applications connected to a database. If it is set high, CPU time is saved but deadlocks may not be detected soon enough.

[  ]
Other users are not preparing and binding applications.

The procedure of preparing and binding applications requires that locks be obtained on system catalog tables. This procedure should be performed in off-peak hours.

[  ]
Lock waits or deadlocks are not caused by next key locking.

Next key locking guarantees a cursor stability (CS) isolation level by automatically locking the next key for all INSERT and DELETE statements and the next higher key value above the result set for SELECT statements. It is required to guarantee ANSII and SQL 92 standard CS, and is the DB2 default.

Examine explain information for the application. If the problem appears to be with next key locking, you can change the isolation level to Read Stability (RS) by setting the DB2_RR_TO_RS flag on.

The DB2_RR_TO_RS flag stops all next key locking on user tables (catalog tables are not affected). Any packages bound with CS are automatically downgraded to RS, because DB2 can no longer guarantee CS. Do not use this flag if you require ANSII and SQL standard CS,

For information on locking, configuration parameters, and explain information, refer to the Administration Guide, Performance.

Improving Database Performance

Database performance is a complex subject and its details extend beyond the scope of this guide. As a starting point, be aware of the following:

For more information on performance:


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]