Replication Guide and Reference

Deciding what to replicate

As part of planning for replication, you need to consider how the data will be used at the target site. Often, the source data will need to be subsetted, transformed, or enhanced for decision-support or data-warehousing applications. This section sorts these requirements into those that are easily fulfilled by using the DB2 Control Center and those that require direct manipulation of the control tables.

The Control Center and DJRA support the following data manipulations:

Subsetting columns and rows
Replicating joins using views
Replicating before and after images
Renaming columns
Creating computed columns
Using stored procedures for before and after run-time processing

The following sections describe the data manipulations that you can perform using the Control Center. This chapter also describes replicating large-object (LOB) data, limits for column names for before-image data, and data-type restrictions.

Subsetting columns and rows

IBM Replication supports both column (vertical) and row (horizontal) subsetting of the source table. This means that you can specify that only a subset of the source table columns and rows be replicated to the target table, rather than all of the columns and rows:

Column subsetting

In some replication scenarios, you might not want to replicate all columns to the target table, or the target table might not support all data types defined for the source table. You can define a column subset that has fewer columns than your source table. Column subsetting is available for all tables except replica tables.

You can define column subsetting at one of two times:

When you define a replication source table for differential refresh.
Select only those columns you want to make available for replication to a target table. Because CD tables must contain sufficient key data for point-in-time copies, you must include primary-key columns in your subset. The columns that you do not select will not be available for replication to any target table.
When you define a subscription set.
Use the advanced subscription options to select only those columns that you want to replicate to the target table. The columns that you do not select are still available for other subscription sets, but are not included for the current subscription set.

Recommendation: When you define a replication source, select all columns (that is, do not subset any of them). Create your column subsets when you define subscription sets. By defining your column subsets in the subscriptions rather than in the replication sources, you will not have to redefine your replication sources if your subscription requirements change.

Row subsetting

In some replication scenarios, you might want to replicate different data from a source table to several target tables. You can define a row subset that contains rows matching a certain condition (a WHERE clause), for example, all rows for department "J35".

Use the advanced subscription options to define a WHERE clause when you define the subscription. All target table types support row subsetting.

If the primary key values of the target table will be updated, or the table (or view) contains a logical partitioning column that will be updated, you must specify replication logical-partitioning-key support when you define the replication source. Replication logical-partitioning-key support performs an UPDATE as a DELETE followed by an INSERT. See Enabling replication logical-partitioning-key support for more information.

Replicating joins using views

Join views fill many requirements, both for denormalizing (restructuring) copies in data-warehouse scenarios, which enables simpler queries of copied data, and also for addressing the routing problem, sometimes called the database partitioning problem in distributed computing scenarios. ⁸ Views are also useful when you need to specify predicates for a row subset that exceed 512 bytes (the capacity of the PREDICATES column of the subscription-targets-member control table). Thus, you can choose to manage your subset predicates using views rather than as part of the subscription-set definition.

To define a join view as a replication source, first define all tables that participate in the join as replication sources (you do not need to define subscriptions for them). Then, use the Control Center or DJRA to define the join. If the replication sources defined in the join have CD or CCD tables, the Control Center or DJRA creates a CD view from the replication sources' CD tables.

IBM Replication supports the following types of view definitions:

Simple views over a single table
Only views of tables that reside within DB2 or DB2 DataJoiner databases are supported. Views of tables that are stored on Oracle, Microsoft SQL Server, Sybase, or Informix are not supported.
Simple inner-joins over one or more defined replication sources
Simple inner-joins over one or more CCD staging tables ⁹ that are defined as replication sources and maintained by an Apply program or an application other than an IBM Replication component and an external data source, such as DataPropagator NonRelational with IMS source data

Replicating before and after images

You can define both before and after images in your replication sources and subscriptions. A before-image column is a copy of a column before it is updated, and an after-image column is a copy of a column after it is updated. DB2 logs both the before-image and after-image columns of a table for each change to that table. Replicating before images can be useful for auditing purposes, and is required for update-anywhere scenarios.

The before and after images have different values for different actions, as shown below:

Action

Column Value

Full refresh

All before-image columns have a NULL value.

Insert

The before-image column has a NULL value.

Update

Column values before the change are captured in the before-image columns; values after the change are in the after-image columns.

When you enable logical-partitioning key support, the before-image column appears in the deleted column and the after-image column appears in the inserted column. See Enabling replication logical-partitioning-key support for more information.

Delete

Both the before-image and after-image columns contain the before-image value.

Before images do not make sense for base aggregate target-table types (there is no before image for computed columns). All other target-table types can make use of before-image columns.

Renaming columns

You can rename columns for point-in-time and user-copy target-table types. For other table types, you must define views in order to rename columns.

Creating computed columns

Using SQL, you can derive new columns from existing source columns. For aggregate target-table types, you can define new columns by using aggregate functions such as COUNT or SUM. For other table types, you can define new columns using SQL expressions.

You can also create computed columns by specifying user-defined functions when you create a table using the DB2 Control Center.

Using stored procedures for before and after run-time processing

You can define run-time processing statements using SQL statements or stored procedures that can run before or after the Apply program processes a subscription set. Such statements can be useful for pruning CCD tables and controlling the sequence in which subscription sets are processed. You can run the run-time processing statements at the source server before a subscription set is processed, or at both the source and target servers before or after a subscription set is processed. For example, you can execute SQL statements before retrieving the data, after replicating it to the target tables, or both.

Stored procedures use the SQL CALL statement without parameters. If the source table is in a non-IBM database, DB2 DataJoiner processes the SQL statements. The procedure name must be eight characters or less in length. The run-time procedures of each type are executed together as a single transaction. You can also define acceptable SQLSTATEs for each statement.

Depending on the DB2 platform, the SQL before and after processing statements can perform other processing, such as calling stored procedures.

Replicating large objects

DB2 Universal Database and DB2 for OS/390 V6 support large object (LOB) data types, which include: binary LOB (BLOB), character LOB (CLOB), and double-byte character LOB (DBCLOB). This section refers to all of these types as LOB data.

The Capture program reads the LOB descriptor to determine if any data in the LOB column has changed and thus should be replicated, but does not copy the LOB data to the CD tables. When a LOB column changes, the Capture program sets an indicator in the CD tables. When the Apply program reads this indicator, it then copies the entire LOB column (not just the changed portions of LOB columns) directly from the source table to the target table.

To allow the Capture program to detect changes to LOB data, you must include the DATA CAPTURE CHANGES keywords when you create (or alter) the source table.

Because a LOB column can contain up to two gigabytes of data, you must ensure that you have sufficient network bandwidth for the Apply program. Likewise, your target tables must have sufficient disk space to accommodate LOB data.

Restrictions:

To copy LOB data between DB2 for OS/390 V6 and DB2 Universal Database (for any other operating system), you need DB2 Connect 5.2 or later.
When the source data is in DB2 Universal Database (for any operating system other than OS/390) and the target data is in DB2 for OS/390 V6, you must use the push configuration for replicating LOB data. That is, the Apply program must run on the system with the source data.
You can copy LOB data only to read-only tables. Thus, you cannot replicate LOB data to replica or row-replica tables.
The primary key for the source table and the subscription-set definition must match. Code-page differences that affect key values can inhibit the Apply program's ability to locate the source-table row that contains LOB data.
You cannot refer to LOB data using nicknames.
For DB2 for OS/390, any table that contains LOB columns must also contain a ROWID column.

Limits on column names for capturing before-image data

DB2 DataPropagator limits column names to 17 characters. Because DB2 DataPropagator adds a before-image column identifier (usually X) to target tables and because you must ensure that each column name is unique, you cannot use longer column names for those tables you replicate. For tables you do not plan to replicate, you can use longer column names, but consider using 17-character names in case you might want to replicate these tables in the future. For tables in DB2 for OS/390, you can use 18-character column names, but DB2 DataPropagator will replace the 18th character with the before-image column identifier in target tables, so you must ensure that the first 17 characters of the name are unique.

Data restrictions

Currently, DB2 DataPropagator cannot replicate certain types of data. The major restrictions are:

Archive-log access restrictions with DB2 for VSE & VM
Allocate sufficient disk space for the active log because Capture for VSE and Capture for VM cannot read archived logs.
Data compression restrictions with DB2 for OS/390
DB2 DataPropagator can replicate data that is compressed through DB2 software or hardware compression on DB2 for OS/390 V4 (or higher) if the dictionary used to compress the data is available. Before issuing REORG for compressed replication sources, you must either:
- Ensure that the Capture program completed capturing all existing changes.
- Use the KEEPDICTIONARY option on the REORG command to preserve the existing compression dictionary.
DB2 DataPropagator cannot replicate data that is compressed using EDITPROCs or FIELDPROCs.
Utility program restrictions
DB2 DataPropagator cannot capture updates made by any of the database utilities. DB2 DataPropagator also cannot capture updates to data loaded with the LOAD RESUME LOG YES options.
Data encryption restrictions
DB2 DataPropagator cannot replicate data that is encrypted.
Data type restrictions
DB2 DataPropagator cannot replicate the following data types under any circumstances:
- Any column on which a VALIDPROC is defined
- Binary data types with precision
DB2 DataPropagator can replicate the following data types under certain circumstances:
- Long variable graphic (LONG VARGRAPHIC) data requires that the source and target tables be in DB2 for OS/390, DB2 for VSE, or DB2 for VM.
- Long variable character (LONG VARCHAR) data requires either that the source tables be in DB2 for OS/390 or both the source and target tables be in DB2 Universal Database Version 5.2 or later.
- Binary large objects (BLOBs), character large objects (CLOBs), and double-byte character large objects (DBCLOBs) require DB2 for OS/390 V6 or DB2 Common Server V2 or higher. DB2 DataPropagator can replicate only a full LOB; it cannot replicate parts of a LOB. See Replicating large objects.
- DB2 DataPropagator cannot capture changes to ASCII tables on DB2 for OS/390. DB2 DataPropagator can perform full refresh using ASCII tables.
User-defined data types (distinct data types in DB2 Universal Database) are converted to the base data type before replication.
DB2 DataJoiner restrictions
You must have one DB2 DataJoiner database for each non-IBM source server. Although you can use one DB2 DataJoiner database for replicating data to multiple non-IBM target servers, you need a unique DB2 DataJoiner database for each non-IBM source server. The reason for this restriction follows:
For every replication scenario, there is a set of control tables, each with names that cannot be changed. When replicating to non-IBM target servers, none of these control tables needs to be located in the non-IBM database because a DB2 DataJoiner database is the target for the Apply program. The nicknames used here refer to the target table and not to any of the control tables.
For non-IBM source servers, however, some of the control tables must be located in the non-IBM database so the Capture triggers can update them. Because of this location requirement, the DB2 DataJoiner nicknames associated with those control tables must be the actual control table names, and their schema must be ASN. Because a DB2 DataJoiner database cannot contain more than two identical nicknames with identical schemas, one DB2 DataJoiner database must be used for each non-IBM source server. You can, however, support multiple non-IBM source servers within one DB2 DataJoiner instance by creating multiple DB2 DataJoiner databases within that one DB2 DataJoiner instance.
For DataPropagator for Microsoft Jet, row-replica tables are considered sources rather than targets, and so you do not need to define nicknames for Microsoft Jet or Microsoft Access tables in DB2 DataJoiner. See Mobile replication using DB2 DataPropagator for Microsoft Jet.

Footnotes:

⁸: For example, knowing where to send a bank account update may require a join of the account table with the customer table to determine which branch of the bank the customer deals with. Typically, production databases are normalized so that the geographic details, such as branch-number, are not stored redundantly throughout the database.
⁹: The CCD tables for simple inner-joins must be complete and condensed. See Staging data.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]