When replication fails or executes in an unexpected way, you need to determine the cause of the problem and find a solution that fixes the problem. DB2 DataPropagator provides error messages, trace records, log records, and information stored in the control tables to assist you in the problem-determination task. Using the information provided by the product, you can often fix a problem yourself, but sometimes the best solution is to call IBM Software Support. This chapter includes:
This book also includes other resources that can help you in problem determination:
Recommendation: When using problem-determination facilities to test or debug your replication scenarios:
The following scenario shows how you might trace a replication error using the facilities discussed in this chapter.
After you define replication sources and subscription sets, the SQL statements for your replication requests complete satisfactorily, but the Apply program does not replicate the data successfully. To determine the error, you could:
The DB2 Control Center or DJRA can encounter errors when it gathers information from source servers, target servers, or control servers to create the SQL statements for administration and when it is actually running the SQL statements to set up the replication sources and subscription sets. The primary indicators are SQL messages, SQLSTATE codes, and Capture- and Apply-program messages (those that begin with the letters ASN) that accompany the error. See the DB2 Universal Database Message Reference (or DB2 message reference for your platform) for more information about DB2 error messages and SQL states. The ASN messages are also described in Capture and Apply messages and Replication messages for AS/400.
The Capture- and Apply-program messages are issued or recorded in the following locations:
The Capture and Apply programs can also encounter problems while capturing and replicating changed data, even though the SQL that the DB2 Control Center or DJRA generated for defining replication sources and subscriptions ran without error. You can determine the cause of the errors with information from the following sections: Problem determination for the Apply program, Problem determination for the Capture program, and Troubleshooting.
DB2 provides the following problem-determination tools for the Apply program:
Each time the Apply program attempts to replicate a subscription set, it inserts a new row in the Apply trail table (ASN.IBMSNAP_APPLYTRAIL). This table is located at each control server, along with other control tables, and has one row for each successful or unsuccessful subscription cycle. Because the Apply program appends to this table, you should prune it periodically. For a description of the Apply trail table, see Apply trail table.
For subscription sets that do not replicate successfully, the Apply trail table records the SQL code and the SQLSTATE. Additional SQL codes and states associated with the problem can be found in the Apply program trace file.
To query the Apply trail table for problem-determination information:
Use the Replication Analyzer to view information in this table; see Using the Replication Analyzer. You can also query information in the Apply trail table directly to gather problem-determination information for failing subscription sets.
SELECT TARGET_TABLE, STATUS, SQLSTATE, SQLCODE, SQLERRM, APPERRM, LASTRUN, LASTSUCCESS FROM ASN.IBMSNAP_APPLYTRAIL WHERE STATUS <> 0 ORDER BY TARGET_TABLE, LASTRUN DESC, SQLCODE DESC, SQLSTATE ASC
This query returns the following columns from the Apply trail table:
You can trace the operation of the Apply program to help isolate the causes of certain kinds of replication errors. The Apply program creates a trace file when you include a trace parameter to the Apply program start command. See the Capture and Apply chapter in for your platform in Operations for the syntax of the command.
While tracing its activity, the Apply program records the following kinds of information in the trace file:
The Apply program inserts error messages and indicators in the trace file at points when it encounters an error.
To create a trace file:
When you specify a trace parameter, you must also specify the name of a trace file (for workstation systems, precede the trace file name with the greater than symbol (>) to redirect the output to the file). For example, to start Apply for Windows with trace, issue the following command from the command line:
asnapply ApplyQual CtlSrvr trcflow > apply.trc
where ApplyQual is the Apply qualifier, CtlSrvr is the control server, and apply.trc is the trace file. The trace file is created in the directory from which you start the Apply program.
For a summary of the Apply program's activities, you can view the Apply program log file. The log file is in the directory from which you start the Apply program and contains messages issued by the Apply program. Because the information in the Apply program log file is high level, it typically directs you to the Apply trail table for more detailed information.
The name of the log file is ApplyQual.APP, where ApplyQual is the Apply qualifier associated with the Apply-program instance.
DB2 provides the following problem-determination tools for the Capture program:
This section also includes problem-determination for the Capture program on the following operating systems: AS/400, OS/390, VM and VSE.
The Capture program inserts a new row in the trace table (ASN.IBMSNAP_TRACE) whenever it encounters an event for which it also issues a message. This table is located at the source server and contains basic information about the activities of the Capture program. For a description of the trace table, see Trace table.
To query the trace table for problem-determination information:
Use the Replication Analyzer to view information in this table; see Using the Replication Analyzer. You can also query information in the trace table directly to gather problem-determination information for the Capture program.
SELECT * FROM ASN.IBMSNAP_TRACE ORDER BY TRACE_TIME
This query returns the following columns from the trace table:
For AS/400 systems, this table also includes the following columns:
You can trace the operation of the Capture program to help isolate the causes of certain kinds of replication errors. The Capture program creates a trace file when you include a trace parameter to the Capture program start command. See the Capture and Apply chapter in for your platform in Operations for the syntax of the command.
The Capture program inserts error messages and indicators in the trace file at points when it encounters an error.
To trace problems on AS/400 systems, view the job logs of the control and journal jobs. See Problem determination for AS/400 for more information.
To create a trace file:
asnccp SrcSrvr trace > capture.trc
where SrcSrvr is the source server and capture.trc is the trace file. Because this command does not include parameters for the type of start (WARM, WARMNS, or COLD) or for pruning (PRUNE or NOPRUNE), the Capture program uses the defaults (WARM and PRUNE). The trace file is created in the directory from which you start the Capture program.
For a summary of the Capture program's activities, you can view the Capture-program log file. The log file is in the directory from which you start the Capture program and contains messages issued by the Capture program.
The name of the log file is SrcSrvr.CPP, where SrcSrvr is the name of the source server.
For OS/390, VM, and VSE, the Capture program provides the following tools:
SYSPRINT (for OS/390)
the console (for VM)
STDOUT (for VSE)
This information can be used by IBM Software Support to diagnose operational problems.
For OS/390, the Capture program also provides:
Capture for AS/400 has unique problem determination facilities because of its dependency on journals and journal receivers as its primary source of input. This section describes problem-determination facilities for the Capture program for AS/400 and associated problem-recovery methods: gathering information about active jobs, determining if the journal job is started, and collecting data for problem determination.
One of the first steps in problem determination is gathering information about currently active jobs. Issue the Work with Submitted Jobs command (WRKSBSJOB QZSNDPR) for a list of all active jobs in the subsystem. If you fail to find a particular job, use the WRKSBMJOB command to locate and view the job log for the job. The name of the Capture program control job is QZSNCTL5. The name of the journal job is the same as the journal name (either QSQJRN-the default journal name for SQL collections-or a name that you specified for the journal).
Be sure to record the 6-digit job numbers because you might need them later in the problem-determination process.
If, five minutes (or more) after starting the Capture program, you discover that only one job (QDPRCTL5) is running, check the following conditions:
When you first issue the STRDPRCAP command, there might be replication pairs that meet the first condition, but not the second one. Both conditions must hold before a journal job can start.
Every two minutes (or at the frequency that you specified for the WAIT parameter of the STRDPRCAP command), the QZSNCTL5 job wakes up to determine if any replication source exists that meets both conditions. If it finds a replication source that is eligible for replication, it starts the journal job.
To determine if the SYNCHPOINT column in the pruning control table is set to hex zeroes for a particular replication pair, execute the following SQL statement from the source server:
SELECT HEX(SYNCHPOINT) FROM ASN/IBMSNAP_PRUNCNTL WHERE SOURCE_TABLE='SrcTbl' AND SOURCE_OWNER='SrcOwn' AND SOURCE_VIEW_QUAL=SrcVwQual
where SrcTbl is the library name, SrcOwn is the table name, and SrcVwQual is the source-view qualifier for the replication source in question. Both SrcTbl and SrcOwn are case-sensitive.
The following items are needed for problem determination for the Capture program. The items are listed in order of importance:
SELECT DBXFIL FROM QSYS/QADBXREF WHERE DBXLFI = 'sqlname' AND DBXLIB = 'lib'
where sqlname is the SQL name of the CD table. Both sqlname and lib are case sensitive.
DMPOBJ QDPR/QZSNINDEX5 *USRIDX
SELECT A.*, HEX(CD_OLD_SYNCHPOINT), HEX(CD_NEW_SYNCHPOINT) FROM ASN/IBMSNAP_REGISTER A WHERE SOURCE_OWNER='SrcOwn' AND SOURCE_TABLE='SrcTbl' SELECT * FROM ASN.IBMSNAP_REG_EXT WHERE SOURCE_OWNER='SrcOwn' AND SOURCE_TABLE='SrcTbl'
where SrcOwn is the library name and SrcTbl is the table name of the replication source in question. Both SrcOwn and SrcTbl are case sensitive.
SELECT A.*, HEX(SYNCHPOINT) FROM ASN/IBMSNAP_REGISTER A WHERE GLOBAL_RECORD = 'Y'
SELECT * FROM ASN/IBMSNAP_TRACE WHERE SUBSTR(JOB_NAME, 21, 6) = 'jobnum' ORDER BY TRACE_TIME
where jobnum is the job number for the job you want to investigate. The DESCRIPTION column provides important information about the job.
Examples:
To gather trace table entries after 7 a.m., March 31, 2000, execute the following query:
SELECT * FROM ASN/IBMSNAP_TRACE WHERE TRACE_TIME > '2000-03-31-07.00.00.000000' ORDER BY TRACE_TIME
To retrieve all ASN0303 (data capturing is interrupted...) trace entries after 7 a.m., March 31, 2000, execute the following query:
SELECT * FROM ASN/IBMSNAP_TRACE WHERE TRACE_TIME > '2000-03-31-07.00.00.000000' AND SUBSTR(DESCRIPTION, 1, 7) = 'ASN0303' ORDER BY TRACE_TIME
SELECT * FROM ASN/IBMSNAP_WARM_START WHERE JRN_LIB='JLib' AND JRN_NAME='JName'
where JLib is the library name and JName is the table name for the journal. Both JLib and JName are case sensitive.
DMPOBJ QDPR/QDPRxxxxxx *USRSPC
The Replication Analyzer analyzes the following tables for correctness, consistency, and efficiency: replication control tables, change data tables, target tables, and target-table indexes. It also checks database parameters for optimal replication performance. You can use the Replication Analyzer to analyze the behavior of the Capture program or the Apply program. It can answer such questions as: "Why is Capture not capturing?" and "why is Apply not applying?" The Replication Analyzer can help diagnose problems, verify replication setup, and offer suggestions for performance tuning.
You can run the Replication Analyzer at any time after replication setup is complete to analyze a failure by the Capture program or the Apply program, or simply to verify your setup. You can find the Replication Analyzer in the \sqllib\bin directory. The bind file for the Analyzer, analyze.bnd, is in the \sqllib\bnd directory, but it is not necessary to bind the program because the Analyzer is autobound for DB2 V6 and V7. If you run the Analyzer with DB2 V5, use the following command to bind the Analyzer:
bind analyze.bnd isolation UR
Important: The Replication Analyzer works only with DataPropagator Version 5 or later; it does not work with DataPropagator Version 1. Contact IBM Software Support if you need a version of the Replication Analyzer that can work with DataPropagator Version 1.
You run the Replication Analyzer from the Windows command line. Type the name of the Replication Analyzer command file (analyze.exe) followed by a list of DB2 alias names of source, target and control servers, separated by blanks (each of these names must be eight or fewer characters in length).
Because the Replication Analyzer runs from the command line, you do not need to have either DJRA or the DB2 Control Center running when you run the Replication Analyzer.
If end-user authentication is required to connect to any server, you must create a password file in the current directory. This file must be named analyze.pwd and should contain one or more entries with the following format:
SERVER=server_alias USER=userid PWD=password
IBM Software Support may ask you to run the Replication Analyzer and send the generated file to someone within IBM to check the output for correctness.
Syntax:
.-----------------. V | >>-analyze-+----DB_alias_name---+---+------------+---| A |--+-->< | +-deepcheck--+ | | '-lightcheck-' | '-?----------------------------------------------' A .------------------------. V | |------+------------------+--+---+------------------+-----------| '-q--=--ApplyQual--' '-f--=--directory--'
The keywords are not case sensitive.
Output: The Replication Analyzer creates an output HTML file, analyze.htm, which you can open and view with any Web browser. This file is stored in the same directory from which you run the Analyzer, or in the directory specified by the f keyword. Messages from the Analyzer are sent to stdout. The output HTML file contains the following information:
Some of this information is only included when the deepcheck keyword is specified.
Examples:
analyze mydb1 mydb2 analyze mydb1 mydb2 f=mydirectory analyze lightcheck mydb1 mydb2 analyze q=applyqual1 q=applyqual2 deepcheck mydb1 mydb2 f=mydirectory
This section describes various problems that can occur when running the Capture and Apply programs and how to diagnose the cause of these problems. You should also use the Replication Analyzer to determine general and specific problems with the replication environment.
Problem: The Capture for OS/390 program does not start.
Ensure that APF authorization was performed for all STEPLIB libraries as specified in the RUN JCL.
Problem: Capture for VM or Capture for VSE does not start.
Ensure that:
Problem: The Capture program is not capturing updates.
Any of the following errors could prevent the Capture program from capturing updates:
Check the trace table for possible error messages.
Problem: I'm not sure if the Capture program is running successfully.
The first time that you start the Capture and Apply programs, the Apply program performs a full refresh to populate the target tables. Then the Capture program writes message ASN0104I to the trace table, providing information related to table owner name, table name, and starting log sequence number value. This information provides a point from which the Capture program starts to capture updates.
Updates captured from then on are placed in CD tables. They are eventually applied to target tables and pruned from the CD tables. After the Capture program runs for some time, you should see rows in the CD tables if changes are made to the sources. Periodically, check the trace table to see the progress made by the Capture program. If it encounters errors, it sends them to the console and also logs them in the trace table. Similarly, the Apply program logs its information in the Apply trail table.
Problem: Capture for OS/390 issued message ASN0000E instead of the proper message number.
Message ASN0000E is a generic message that is issued instead of a proper message if the specified VSAM message file in the RUN JCL was not found. See the Capture for OS/390 program directory for information on installing the VSAM message file.
Problem: Capture for VM or Capture for VSE issued message ASN0000E instead of the proper message number.
Message ASN0000E is a generic message that is issued instead of a proper message if either the default message file, ASNLS001 MSG, or the specified message file in the Capture startup JCL was not found. See the Capture for VM or Capture for VSE program directory for information on installing the message file.
Problem: The Capture program terminates.
The Capture program terminates either because of a severe error, or when you issue the stop command. The Capture program terminates with a return code that indicates successful or unsuccessful completion. Return codes are:
Problem: Capture for OS/390 failed while using the LE for OS/390 environment.
The Capture program runs in the LE for OS/390 environment and the recommended minimum Capture job-step region size when starting Capture is REGION=10M.
Problem: Error message 0509 was issued.
Error message 0509 occurs because multiple versions of DB2, or DB2 and DataJoiner, are installed on the same system:
Ensure that the LIBPATH environment variable is set to the same environment in which the Apply program starts.
Problem: Apply component for DB2 Universal Database stops with an SQLCODE= -330, SQLSTATE=22517, "A string cannot be used, because its characters cannot be translated".
When copying between DB2 for OS/390 and DB2 on another platform, the CCSID translation can cause an INSERT to fail if a translated value is longer than the DB2 column in which it will be inserted.
If you are running in a mixed environment, ensure that you have installed the latest maintenance for the CCSID support of your DB2 for OS/390 program.
For more information on character translation, see the character conversion appendix in the DB2 for OS/390: Installation Guide.
Problem: I received system error 1067 trying to start the Capture or Apply program as a Windows NT Service.
Error code 1067 occurs under the following circumstances:
dbname pathname\asnccp.exe <parameters>
followed by CRLF.
Problem: The ASNSERV.LOG file in ASNPATH tells me that the Apply program was started correctly, but the Apply process terminated.
To find out why the Apply program terminated, change the syntax of NTSERV.ASN to:
...ASNAPPLY APPLYQUAL TRCFILE
The trace output will be written to the Apply trace file:
<ASNPATH pathname>ApplyQual.TRC
Problem: I performed a successful bind, but when running the Apply program, I still get SQLCODE -805, SQLSTATE 51002.
Make sure that the user ID has EXECUTE privilege on the Apply program packages, and make sure to bind both the Apply program packages to the control, source, and target server databases.
Problem: The DB2 log filled to capacity because I copied a very large table.
If the error occurred during a full refresh, you can use alternative methods to load large tables. You can either use the ASNLOAD exit routine, or you can perform your own load, as described in Loading target tables offline using DJRA.
If the error occurred while applying changed data, then you can change the data-blocking parameter to break down large blocks of changed data. See Specifying a data-blocking value.
Problem: Capture was cold started, which caused the Apply program to perform a full refresh, but I don't want a full refresh.
If your target table is very large, and in cases where you decided to use only your own load mechanism, you might want to suppress any future full refreshes of the Apply program. Set the DISABLE_REFRESH flag to 1 in the register table at the source server for the source table. In this case, the Apply program issues message ASN1016E and copies nothing until you perform a full refresh.
If you want to bypass full refresh and also start capturing data as soon as possible, you can use the offline load function of DJRA. In this case, you do not need to unload and reload the tables, but just run the SQL generated for step 1 and step 4. In addition to disabling full refresh, offline load also deactivates the subscription until you complete step 4. See Loading target tables offline using DJRA.
Problem: A gap was detected, so the Apply program won't perform a full refresh of my target table.
Force a full refresh by resetting the LASTSUCCESS, SYNCHTIME, and SYNCHPOINT values in subscription set table to null.
Problem: I unsuccessfully tried to start a second Apply program instance.
You must run each instance with a unique Apply qualifier.
Problem: I received a security violation message, and the Apply program is not authorized to connect to the database.
The control server name, user ID, and password definitions are case sensitive and must match exactly those specified in the password file. Check your definitions again.
Apply for AS/400 does not use a password file, so it attempts to connect to the database using the user ID specified in the user parameter of the STRDPRAPY CL command. Ensure that you correctly set up your DRDA connectivity definitions.
Problem: I received error ASN1003 with SQLCODE = -1032 and SQLSTATE = 57019.
You must start the database manager before invoking the Apply program.
Problem: Apply receives SQLCODE -206 when fetching the source data.
If you use a CCD table to stage replication to multiple target tables, ensure that the CD or CCD table includes all the columns expected in all target tables. Subsetting columns or selecting UOW columns to be replicated to the final targets can cause this problem when a CCD table is added to a replication scenario after the target tables are defined.
This problem should only occur if you manually define target tables and do not include all the columns when defining replication sources. You can avoid this problem by defining all target tables (adding to subscription sets) after defining any CD and internal CCD tables. DJRA does not allow columns in a target table that are not in the predefined CD or internal CCD tables; DJRA displays only a valid subset of the columns for the target table.
Problem: SQL1108 error while creating control table using DJRA
A SQL1108 error results if DB2 DataJoiner is installed on the same Windows NT system as DJRA, and you try to create the control tables in DB2 for OS/390. The problem is caused by a memory collision with the required memory space for ddcs.dll, because this DLL is not loaded during application initialization.
The workaround for this problem is to set the environment variable DB2DBMSADDR (range is between 20000000 to 70000000). This tells DB2 to put its memory in a different virtual address space to avoid collision with the application address space.
If you call IBM Software Support, you will be asked the following types of questions. You can save time, and perhaps diagnose the error yourself, by researching the answers to these questions: