Troubleshooting Guide

Troubleshooting Tips

This section describes how to deal with some problems frequently encountered by users working with a DB2 server. The following topics are presented in this section:

For additional troubleshooting information on this topic, see:

Important: This section represents a small sampling of the information available from DB2 Customer Support. For a complete and up-to-date source of DB2 information, use the DB2 Product and Service Technical Library on the Web, at http://www-4.ibm.com/software/data/db2/library/ (Note that this information is in English only.)

Problems Migrating to Version 7

If you experience problems migrating to Version 7, ensure that:

[  ]
You followed the instructions in the Quick Beginnings guide.

[  ]
You checked the list of incompatibilities from previous versions. (See the Administration Guide: Planning.)

[  ]
The database is consistent.

To verify that databases are consistent so that they can be migrated, use the db2ckmig command. (See the Quick Beginnings book for more details.)

You must correct all errors that are reported for databases that are in any of the following states:

Additionally, this tool identifies databases that contain objects that use SYSCAT, SYSSTAT, or SYSFUN as schema names. These objects must be dropped and re-created using a different schema name.

[  ]
The disk space on the machine is adequate. (The amount of disk space required varies with each database.)

[  ]
You use appropriate values for the number of primary and secondary log files (logprimary and logsecond in database configuration), and for the size of the log files (logfilsiz in database configuration). See Server Failures for details.

Problems When Applying Fix Packs

Symptom
The error message SQL1042C is received following the installation of a fix pack or a change in the operating system level.

Possible Cause
A fix pack or a change in the operating system level needs to be identified to the DB2 instance (database manager).

Action
Run db2iupt following the installation of a fix pack for DB2 or when there is a change in the operating system level.

Server Failures

Symptom
The DB2 server does not respond to requests or provides unexpected responses.

Action
Ensure that:

[  ]
You check the db2diag.log file on the server to diagnose a cause. For information, see Interpreting the db2diag.log.

[  ]
The DB2COMM registry variable is set to the protocols required by connecting clients. (If not, the message SQL5043N is usually received.)

Consult the db2diag.log file to see which communication protocols were not started successfully, and why. For information, see Using the db2diag.log File to Diagnose Server Communication Problems.

[  ]
There is enough log space. (If not, the SQL0964C message is typically received.)

To increase log space:

  • Increase the number of primary or secondary log files (logprimary and logsecond in database configuration)
  • Increase the size of the log files (logfilsiz in database configuration)

To initiate your changes, you must disconnect all applications from the database and deactivate the database, if it was activated.

You must ensure that the file system or disk space that contains the logs is big enough to hold all of the logs. The amount of space (in bytes) required for log files can range from:

(logprimary * (logfilsiz + 2) * 4096) + 8192

to:

((logprimary + logsecond) * (logfilsiz + 2) * 4096) + 8192

[  ]
The indexes are valid.

The db2diag.log file indicates if indexes need to be re-created. Use db2dart /MI to mark the indexes as inconsistent. See Miscellaneous Troubleshooting Tools for details on db2dart /MI.

To delay the rebuilding of indexes until after the database is started, use the INDEXREC configuration parameter as documented in the Administration Guide: Performance manual.

[  ]
The data is consistent.

Look for the following symptoms for inconsistent data:

  • There is a DIA3700C "A bad page was encountered" error in the db2diag.log file.
  • The server goes down when particular data is accessed. In this case, this data resource may be damaged.
  • The server fails repeatedly.

Run db2dart to verify that the data is consistent. If data is not consistent, contact DB2 Customer Support.

[  ]
You are using the latest version of DB2, with appropriate fix packs installed. Check the Web site for known problems and the fix packs that fix them. For information, see Updating DB2 Products.

For DB2 Universal Database Enterprise - Extended Edition for supported operating systems, see also Database Manager Will Not Start.

Starting DB2 on Windows 95 and Windows 98 When the User Is Not Logged On

For a db2start command to be successful in a Windows 95 or a Windows 98 environment, you must either:

In addition, the user ID that is specified either during the logon or for the db2logon command must meet DB2's requirements (see note (NOTE2)).

When the db2start command starts, it first checks to see if a user is logged on. If a user is logged on, the db2start command uses that user's ID. If a user is not logged on, the db2start command checks whether a db2logon command has been run, and, if so, the db2start command uses the user ID that was specified for the db2logon command. If the db2start command cannot find a valid user ID, the command terminates.

During the installation of DB2 Universal Database Version 6 on Windows 95 and Windows 98, the installation software, by default, adds a shortcut to the Startup folder that runs the db2start command when the system is booted (see note (NOTE1) for more information). If the user of the system has neither logged on nor issued the db2logon command, the db2start command will terminate.

If you or your users do not normally log on to Windows or to a network, you can hide the requirement to issue the db2logon command before a db2start command by running commands from a batch file as follows:

  1. Create a batch file that issues the db2logon command followed by the db2start.exe command. For example:
      @echo off
      db2logon  db2local /p:password
      db2start
      cls
      exit
    
  2. Name the batch file db2start.bat, and store it in the /bin directory that is under the drive and path where you installed DB2. You store the batch file in this location to ensure that the operating system can find the path to the batch file.

    The drive and path where DB2 is installed is stored in the DB2 registry variable DB2PATH. To find the drive and path where you installed DB2, issue the following command:

      db2set  -g  db2path
    

    Assume that the db2set command returns the value c:\sqllib. In this situation, you would store the batch file as follows:

      c:\sqllib\bin\db2start.bat
    
  3. To start DB2 when the system is booted, you should run the batch file from a shortcut in the Startup folder. You have two options:

If you use a batch file to issue the db2logon command before the db2start command is run, and your users occasionally log on, the db2start command will continue to work, the only difference being that DB2 will use the user ID of the logged on user. See note (NOTE1) for additional details.

Notes:

  1. The db2logon command simulates a user logon. The format of the db2logon command is:
      db2logon userid  /p:password
    

    The user ID that is specified for the command must meet the DB2 naming requirements (see note (NOTE2) for more information). If the command is issued without a user ID and password, a window opens to prompt the user for the user ID and password. If the only parameter provided is a user ID, the user is not prompted for a password; under certain conditions a password is required, as described below.

    The user ID and password values that are set by the db2logon command are only used if the user did not log on using either the Windows logon window or the Microsoft Networking logon window. If the user has logged on, and a db2logon command has been issued, the user ID from the db2logon command is used for all DB2 actions, but the password specified on the db2logon command is ignored

    When the user has not logged on using the Windows logon window or the Microsoft Networking logon window, the user ID and password that are provided through the db2logon command are used as follows:

  2. In Version 6, the user ID that is either used to log on or specified for the db2logon command must conform to the following DB2 requirements:

  3. You can prevent the creation of the db2start shortcut in the Startup folder during a customized interactive installation, or if you are performing a response file installation and specify the DB2.AUTOSTART=NO option. If you use these options, there is no db2start shortcut in the Startup folder, and you must add your own shortcut to run the db2start.bat file.

  4. On Windows 98, an option is available that you can use to specify a user ID that is always logged on when Windows 98 is started. In this situation, the Windows logon window will not appear. If you use this option, a user is logged on and the db2start command will succeed if the user ID meets DB2 requirements (see note (NOTE2) for details). If you do not use this option, the user will always be presented with a logon window. If the user cancels out of this window without logging on, the db2start command will fail unless the db2logon command was previously issued, or invoked from the batch file, as described above.

  5. If you do not start DB2 during a system boot, DB2 can be started by an application. You can run the db2start.bat file as part of the initialization of applications that use DB2. Using this method, DB2 will only be started when the application that will use it is started. When the user exits the application, a db2stop command can be issued to stop DB2. Your business applications can start DB2 in this way, if DB2 is not started during the system boot.

    To use the DB2 Synchronizer application or call the synchronization APIs from your application, DB2 must be started if the scripts that are download for execution contain commands that operate either against a local instance or a local database. These commands can be in database scripts, instance scripts, or embedded in operating system (OS) scripts. If an OS script does not contain Command Line Processor commands or DB2 APIs that use an instance or a database, it can be run without DB2 being started. Because it may be difficult to tell in advance what commands will be run from your scripts during the synchronization process, DB2 should normally be started before synchronization begins.

    If you are calling either the db2sync command or the synchronization APIs from your application, you would start DB2 during the initialization of your application. If your users will be using the DB2 Synchronizer shortcut in the DB2 for Windows folder to start synchronization, the DB2 Synchronization shortcut must be modified to run a db2sync.bat file. The batch file should contain the following commands to ensure that DB2 is running before synchronization begins:

      @echo off
      db2start.bat
      db2sync.exe
      db2stop.exe
      cls
      exit
    

    In this example, it is assumed that the db2start.bat file invokes the db2logon and db2start commands as described above.

    If you decide to start DB2 when the application starts, ensure that the installation of DB2 does not add a shortcut to the Startup folder to start DB2. See note (NOTE3) for details.

Using a Windows 2000 Terminal Server environment

Symptom
You receive the error message SQL1032N when attempting use applications such as REXX or ODBC Driver Managers which dynamically load DB2 DLLs.

Possible Cause
The execution mode that was set by the REGISTER command has no effect. The error message is returned because the application could not attach to the shared memory needed to run the application.

Action
There are actually two problems::

SQL1403N Received When Authenticating Usernames or Groups

Symptom
You receive the SQL1403N error message when attempting to get authorization for usernames or groups on a Windows NT Primary Domain running DB2.

Possible Cause
The authorization restrictions on the Windows NT Primary Domain.

Action
Ensure the service for the instance is being started under a local administrator who is not a user from the master domain; and who has operating system rights on the local machine.

Administration Difficulties

Symptom
You have problems installing or administering a database server.

Action
Ensure that:

[  ]
You are using a valid user name and password that have administrator authority.
  • For OS/2 Warp Connect and later versions of OS/2: The default DB2 administrator user name and password are the same as those specified when the operating system was installed.
  • For versions of OS/2 before OS/2 Warp Connect: The default administrator user name and password depend on the national language installed on a machine, and on whether UPM was installed with a particular user name. In most cases, the defaults are USERID and PASSWORD in English countries. For more information, see the Quick Beginnings for your platform.
  • For Windows NT: Your membership in a group controls which actions you are permitted to perform. For DB2 for Windows NT, you must belong to the "Administrators" or "Domain Admins." groups to perform installation and some administration tasks.

    You can find out what your group is by using the User Manager tool to display user names and group memberships. To invoke this tool, select Start --> Programs --> Administrative Tools --> User Manager.

    To install DB2, you must have administrator authority on the local machine. To be added to the administrator group, you must be added by an existing member of that group.

  • For UNIX-based environments: You must belong to a SYSADM group. The name of the group must be eight characters or less in length.
  • For DB2 for Solaris: If a DB2 instance is started by a user who does not belong to the same group as the DB2 instance owner, the SQL1042C message is received for many commands. In this situation, the executing db2sysc process inherits the user's group name and does not have the correct read permission to open a file in the /proc directory.

    You must start the instance as the instance owner, and the id or group of the db2sysc process attempting to read a file must be the same as the id or group of the DB2 instance.

[  ]
For UNIX-based environments other than AIX, you must update and rebuild the kernel configuration parameters. (If not, the SQL1016N or SQL1018N messages typically occur when you try to create an instance or use the command line processor.) See the Quick Beginnings guide for your platform for details.

Backing Up and Restoring Data

It is important to have a good backup and recovery strategy, and a plan to implement this strategy. This plan should address questions such as the following:

Backup plans are heavily influenced by the importance of having the data in the database available for use. If your business depends heavily on the data, then you must develop a backup and recovery plan that will minimize downtime and ensure that the data is available when needed at a cost that you are willing to accept.

For help on backup and recovery see the Data Movement Utilities Guide and Reference and use the DB2 Backup Database Wizard from the Control Center.

SQL0902C Received When Backing up a Database or Accessing DB2 Data

Symptom
The SQL0902C message is received when a database is backed up.

Possible Cause
The database is not architecturally consistent.

Action
Use the db2dart command to verify the architectural consistency of the database. See Miscellaneous Troubleshooting Tools for information on this command.

Usually you can perform a restore with a roll forward to the end of the logs to correct the problem. If the problem persists, contact DB2 Customer Support.

ROLLFORWARD TABLESPACE Fails with SQL4908N Message

Symptom
The error message SQL4908N is received when running a ROLLFORWARD TABLESPACE.

Possible Cause
If starting the first attempt to roll forward a table space, the message may be received when one or more of the table spaces specified to be rolled forward is not in a ROLLFORWARD PENDING state on the specified nodes. If you are continuing to roll forward a table space that is already in progress, the message may be received when one or more of the table spaces specified to be rolled forward is not in a "roll forward in progress" state on the specified nodes.

Action
  1. Use the LIST TABLESPACES SHOW DETAIL command on the nodes specified to find out which table spaces are not ready to be rolled forward.
  2. Use the QUERY STATUS option of the ROLLFORWARD command to determine the status of the table space roll forward.
  3. Depending on your task:
    • If you are starting a new table space roll forward, put the table spaces into ROLLFORWARD PENDING state by restoring them.
    • If you are continuing a table space rollforward and one or more of the table spaces involved has been restored and put into ROLLFORWARD PENDING state, the table space roll forward in progress must be cancelled.
      • Put the table spaces into the RESTORE PENDING state.
      • Restore the table spaces.
  4. Submit the original ROLLFORWARD command again.

Backup to Disk Fails with SQL2419N

Symptom
The SQL2419N error message is received when taking an online backup to disk.

Possible Cause
Although the disk was not full, the backup file was greater than the operating system file size limit.

Action
You can do a backup and specify multiple target directories. The backup directories divide the file into sizes smaller than the 2 GB operating system file size limit.

NetBIOS Connections from DB2 Clients are Lost When Backing up Using TSM

Symptom
NetBIOS connections from DB2 clients are lost when backing up using Tivoli Storage Manager (TSM).

Possible Cause
As part of TSM's initialization, a NCB.RESET is issued, which resets the NetBIOS resources of the calling application. In the case of DB2 for OS/2 and Windows NT, the calling application is DB2 itself. This results in the dropping of all connections to the database made through NetBIOS.

Action
There is no way to prevent this from happening when using NetBIOS as the communication protocol. Use another protocol like TCP/IP when using TSM to backup the database.

Problems Restoring a Database or Table Space Backup

Symptom
Data cannot be restored.

Action
Ensure that:

[  ]
You have enough disk space to restore the data.

Use redirected restore to specify a modified list of containers for the table spaces being restored. See the Administration Guide: Implementation.

[  ]
You specify the correct path to the backup image and logs. (It is possible that they were moved.)

[  ]
If an online backup was taken, all logs from the beginning to the end of the backup are required to restore and roll forward the database.

(The roll forward must proceed to the minimum point in time as specified by the output from the ROLLFORWARD QUERY STATUS command. Otherwise, the database is not accessible.)

RESTORE Using an API Fails with SQL0902C Return Code 22

Symptom
On AIX, the SQL0902C error message with return code 22 is received when restoring a database using an API.

Possible Cause
Abnormal termination of applications has left the message queue connected to an existing database file.

Action
Stop the database by using db2stop. Use db2terminate to eliminate all back-end processes. Clean up DB2 interprocess communications (IPC) resources. Start the database by using db2start. Retry the RESTORE again. Contact DB2 Support if the problem persists.

After a RESTORE, Some Table Spaces Are in a ROLLFORWARD PENDING State

Symptom
After issuing a RESTORE command specifying that no roll forward is required on a Windows NT machine, the table spaces are left in a ROLLFORWARD PENDING state.

Possible Cause
Windows NT is having a resource conflict with DB2 where the hard file cache limit of 192 MB is reached. This hard file cache is used whenever DB2 opens or closes many files, or opens or closes large files.

Action
Ensure the DB2 product is updated to the latest fix pack, and that the DB2NTNOCACHE registry variable is set to one. The DB2NTNOCACHE setting can be checked using the db2set -all command.

SQL1277N is Received When Attempting a Redirected Restore

Symptom
The SQL1277N error message is received during a redirected restore to a new database.

Possible Cause
One or more containers used in the redirected restore may already be in use. Check the db2diag.log file for additional information.

Action
Remove or untag the containers in use and then perform the redirected restore using the Control Center. Remove SMS containers. Untag DMS containers. (SMS may be untagged.) If you prefer to use the command line processor, you must use the SET TABLESPACE CONTAINERS API and then reissue the RESTORE command with the CONTINUE parameter.

To find out more about the SET TABLESPACE CONTAINERS API, type db2 ? set tablespace at the command line.

SQL1024N is Received when Attempting a Redirected Restore

Symptom
The SQL1024N error message is received during a redirected restore.

Possible Cause
A redirected restore has three steps:
  1. Restore redirect
  2. Set table space containers
  3. Restore continue.

After the first step, an implicit connection to the database is established. The connection associated with a session identifier. If a shell script is used for the remaining two steps, a new session is started. There is no implicit connection to the database and as a result the second step fails.

Action
When using shell scripts, ensure that all three steps are in one script.

SQL0298N is Received when Setting Table Space Containers During a Redirected Restore

Symptom
The SQL0298N error message is received when setting table space containers during a redirected restore.

Possible Cause
The SQL0298N error message is received when the customer was attempting to restore a DMS table space into a SMS table space using a redirected restore.

Action
This is not a valid action. The table space type cannot be changed.

Problem with Recovery of Large Databases on Windows NT and Windows 2000

Symptom
BACKUP or RESTORE fails on Windows NT or Windows 2000.

Possible Cause
NTFS file caching runs out of space when the backup image file is larger than 128GB.

Action
Reduce the number of logical drives being used during the BACKUP or RESTORE.

Data Not Loading

SQL3508N Returned during a LOAD Operation

Symptom
The SQL3508N error message is received during a LOAD operation.

Possible Causes
  1. Issuing the LOAD command without explicitly specifying the REMOTE FILE parameter before attempting a LOAD RESTART operation causes the utility to overwrite the default remote file.
  2. The specified file names used are similar even though the specified paths are different. For example: MESSAGES C:\table.MSG and REMOTE FILE D:\table.

Action
  1. Restore the database or restore the table space.
  2. Specify different file names for each of the MESSAGES and REMOTE FILE parameters. For example: MESSAGES C:\table.MSG and REMOTE FILE D:\table.RMT. The RMT extension solves the problem.

General Problems with LOAD

Symptom
Data does not load on a database server.

Action
Ensure that:

[  ]
You used the RESTART or REPLACE parameters with the LOAD command. (If not, the SQL3805N message is typically received.)

For more information on the LOAD command, see the Data Movement Utilities Guide and Reference.

[  ]
The size of a database object in an SMS table space has not reached an operating system limit.

For example, OS/2 and Windows NT have a 2 GB file size limit.

(See Problems with LOAD and with Splitting and Loading Data for more information on loading data in DB2 Universal Database Enterprise - Extended Edition.)

Difficulties Importing Data

Symptom
You are having problems importing data.

Action

[  ]
Ensure that there is enough log space available. (See Server Failures.)

[  ]
Use the COMMITCOUNT n option to commit data after every n records are imported. This option protects the committed data from loss in case of a failure. It also reduces the log requirements for an import operation, allowing a number of smaller transactions rather than one long transaction.

DB2 Support for Raw Devices when Using Windows 2000

Symptom
Referencing raw devices when using Windows 2000 does not appear to work.

Possible Cause
Windows 2000 has a new Dynamic Disk mode that fails when there are too many partitions and you are using raw devices. The Windows 2000 Logical Disk Manager handles the definition of raw devices differently from the method used on Windows 2000.

Action
Define raw devices using the method associated with the Basic Disk mode and using the type of long partition names used on a single partition.

Disk Alignment when Using Windows 2000

Symptom
Performance problems with different DB2 tasks and utilities that involve input and output to disk.

Possible Cause
Windows 2000 has changed the default values for SectorsPerTrack and TracksPerCylinder. For example, SectorsPerTrack is increased from 32 to 63. The problems occur when there are many disks and RAID controllers provide optimizations for track-aligned transfers.

Action
There are several ways to correct this problem. The easiest to do is:

Note:There is no disk alignment problem if you create your partitions using Windows NT and then upgrade to Windows 2000 (or use a dual-boot with Windows 2000).

Problems Using Commands, Utilities, or the Command Line Processor

Symptom
You cannot use commands, utilities, or the Command Line Processor.

Action
Ensure that:

[  ]
You bound the utilities and your applications to the database after you installed a fix or newer executable. (The SQL0818N or SQL0805N messages indicate that you need to rebind.)

When you precompile an SQL application, a compilable file is produced, and optionally a bind file. Both of these files have a new timestamp. The default behavior of the precompile operation is to automatically create a package so that you do not need to bind anything. However, if you create a bind file but not a package, then you must bind the new bind file to the database.

For information on binding, see your Quick Beginnings guide.

[  ]
You used the correct syntax (especially for UNIX-based systems).

It is recommended that you either:

  • Use double quotation marks (" ") around a command line processor request.
  • Precede any special characters such as asterisks (*), brackets, or question marks (?) with a back slash (\) character to ensure that the command line processor interprets them correctly.

Example

For:

db2 SELECT * FROM SYSCAT.TABLES

Use either:

db2 "SELECT * FROM SYSCAT.TABLES"

or

db2 SELECT \* FROM SYSCAT.TABLES

Locking and Deadlocks

The database manager provides concurrency control and prevents uncontrolled access by means of locks. Because of the basic principles of DB2 locking, you do not need to take action to control locks in most cases. For information on how locking works, see the Administration Guide.

Diagnosing locking problems

Use the following methods to investigate whether or not applications are creating deadlocks or holding locks:

For information, see The Database System Monitor and the System Monitor Guide and Reference.

SQL0911N Received

Symptom
The SQL0911N message is received.

Possible Cause
There are deadlocks or time-outs.

Action
Check the reason code in the SQLCA to determine if deadlocks or time-outs are causing the problem (see Interpreting an SQLCA Structure).

In the case of deadlocks, ensure your applications are coded in such a way that they can handle -911 return codes. Possibly increase the size of the lock list to avoid lock escalation which can often result in a deadlock scenario.

Note that time-outs are not captured using the deadlock event monitor.

Applications Slow or Appear to Hang

Symptom
Applications accessing a DB2 database are slow or appear to hang.

Possible Cause
There is lock contention or lock escalation.

Action
Use the Control Center or the LIST APPLICATIONS FOR DATABASE database-alias SHOW DETAIL command to find out if applications are waiting on a lock and who is holding the lock that the applications are waiting for. Then use the database monitor to determine if lock escalation is occurring.

If escalation is occurring, ensure that:

[  ]
Applications use an appropriate frequency of commits.

Take a monitor snapshot for locks to determine which applications are causing other applications to wait for locks. Also, monitor the db2.lock_waits performance variable.

[  ]
The values of the database configuration parameters maxlocks and locklist, which determine the number of locks allowed, are appropriate.

[  ]
The locktimeout database configuration parameter is set appropriately. When it is turned on, this parameter can help avoid hangs due to lock contention, and can tell you if lock contention is your problem.

[  ]
The dlchktime database configuration parameter is set appropriately. This parameter defines the frequency at which the database manager checks for deadlocks among all the applications connected to a database. If it is set high, CPU time is saved but deadlocks may not be detected soon enough.

[  ]
Other users are not preparing and binding applications.

The procedure of preparing and binding applications requires that locks be obtained on system catalog tables. This procedure should be performed in off-peak hours.

[  ]
Lock waits or deadlocks are not caused by next key locking.

Next key locking guarantees a cursor stability (CS) isolation level by automatically locking the next key for all INSERT and DELETE statements and the next higher key value above the result set for SELECT statements. It is required to guarantee ANSI and SQL92 standard CS, and is the DB2 default.

Examine snapshot information for the application. If the problem appears to be with next key locking, you can change the isolation level to Read Stability (RS) by setting the DB2_RR_TO_RS option on.

The DB2_RR_TO_RS option stops all next key locking on user tables (catalog tables are not affected). Any packages bound with CS are automatically downgraded to RS, because DB2 can no longer guarantee CS. Do not use this option if you require ANSI and SQL92 standard CS.

For information on locking, configuration parameters, and explain information, see the Administration Guide: Performance.

Improving Database Performance

Database performance is a complex subject and its details extend beyond the scope of this guide. As a starting point, be aware of the following:


[ Top of Page | Previous Page | Next Page ]