As part of planning for replication, you need to consider how the data will be used at the target site. Often, the source data will need to be subsetted, transformed, or enhanced for decision-support or data-warehousing applications. This section sorts these requirements into those that are easily fulfilled by using the DB2 Control Center and those that require direct manipulation of the control tables.
The Control Center and DJRA support the following data manipulations:
The following sections describe the data manipulations that you can perform using the Control Center. This chapter also describes replicating large-object (LOB) data, limits for column names for before-image data, and data-type restrictions.
IBM Replication supports both column (vertical) and row (horizontal) subsetting of the source table. This means that you can specify that only a subset of the source table columns and rows be replicated to the target table, rather than all of the columns and rows:
You can define column subsetting at one of two times:
Select only those columns you want to make available for replication to a target table. Because CD tables must contain sufficient key data for point-in-time copies, you must include primary-key columns in your subset. The columns that you do not select will not be available for replication to any target table.
Use the advanced subscription options to select only those columns that you want to replicate to the target table. The columns that you do not select are still available for other subscription sets, but are not included for the current subscription set.
Recommendation: When you define a replication source, select all columns (that is, do not subset any of them). Create your column subsets when you define subscription sets. By defining your column subsets in the subscriptions rather than in the replication sources, you will not have to redefine your replication sources if your subscription requirements change.
Use the advanced subscription options to define a WHERE clause when you define the subscription. All target table types support row subsetting.
If the primary key values of the target table will be updated, or the table (or view) contains a logical partitioning column that will be updated, you must specify replication logical-partitioning-key support when you define the replication source. Replication logical-partitioning-key support performs an UPDATE as a DELETE followed by an INSERT. See Enabling replication logical-partitioning-key support for more information.
Join views fill many requirements, both for denormalizing (restructuring) copies in data-warehouse scenarios, which enables simpler queries of copied data, and also for addressing the routing problem, sometimes called the database partitioning problem in distributed computing scenarios. 8 Views are also useful when you need to specify predicates for a row subset that exceed 512 bytes (the capacity of the PREDICATES column of the subscription-targets-member control table). Thus, you can choose to manage your subset predicates using views rather than as part of the subscription-set definition.
To define a join view as a replication source, first define all tables that participate in the join as replication sources (you do not need to define subscriptions for them). Then, use the Control Center or DJRA to define the join. If the replication sources defined in the join have CD or CCD tables, the Control Center or DJRA creates a CD view from the replication sources' CD tables.
IBM Replication supports the following types of view definitions:
Only views of tables that reside within DB2 or DB2 DataJoiner databases are supported. Views of tables that are stored on Oracle, Microsoft SQL Server, Sybase, or Informix are not supported.
You can define both before and after images in your replication sources and subscriptions. A before-image column is a copy of a column before it is updated, and an after-image column is a copy of a column after it is updated. DB2 logs both the before-image and after-image columns of a table for each change to that table. Replicating before images can be useful for auditing purposes, and is required for update-anywhere scenarios.
The before and after images have different values for different actions, as shown below:
When you enable logical-partitioning key support, the before-image column appears in the deleted column and the after-image column appears in the inserted column. See Enabling replication logical-partitioning-key support for more information.
Before images do not make sense for base aggregate target-table types (there is no before image for computed columns). All other target-table types can make use of before-image columns.
You can rename columns for point-in-time and user-copy target-table types. For other table types, you must define views in order to rename columns.
Using SQL, you can derive new columns from existing source columns. For aggregate target-table types, you can define new columns by using aggregate functions such as COUNT or SUM. For other table types, you can define new columns using SQL expressions.
You can also create computed columns by specifying user-defined functions when you create a table using the DB2 Control Center.
You can define run-time processing statements using SQL statements or stored procedures that can run before or after the Apply program processes a subscription set. Such statements can be useful for pruning CCD tables and controlling the sequence in which subscription sets are processed. You can run the run-time processing statements at the source server before a subscription set is processed, or at both the source and target servers before or after a subscription set is processed. For example, you can execute SQL statements before retrieving the data, after replicating it to the target tables, or both.
Stored procedures use the SQL CALL statement without parameters. If the source table is in a non-IBM database, DB2 DataJoiner processes the SQL statements. The procedure name must be eight characters or less in length. The run-time procedures of each type are executed together as a single transaction. You can also define acceptable SQLSTATEs for each statement.
Depending on the DB2 platform, the SQL before and after processing statements can perform other processing, such as calling stored procedures.
DB2 Universal Database and DB2 for OS/390 V6 support large object (LOB) data types, which include: binary LOB (BLOB), character LOB (CLOB), and double-byte character LOB (DBCLOB). This section refers to all of these types as LOB data.
The Capture program reads the LOB descriptor to determine if any data in the LOB column has changed and thus should be replicated, but does not copy the LOB data to the CD tables. When a LOB column changes, the Capture program sets an indicator in the CD tables. When the Apply program reads this indicator, it then copies the entire LOB column (not just the changed portions of LOB columns) directly from the source table to the target table.
To allow the Capture program to detect changes to LOB data, you must include the DATA CAPTURE CHANGES keywords when you create (or alter) the source table.
Because a LOB column can contain up to two gigabytes of data, you must ensure that you have sufficient network bandwidth for the Apply program. Likewise, your target tables must have sufficient disk space to accommodate LOB data.
Restrictions:
DB2 DataPropagator limits column names to 17 characters. Because DB2 DataPropagator adds a before-image column identifier (usually X) to target tables and because you must ensure that each column name is unique, you cannot use longer column names for those tables you replicate. For tables you do not plan to replicate, you can use longer column names, but consider using 17-character names in case you might want to replicate these tables in the future. For tables in DB2 for OS/390, you can use 18-character column names, but DB2 DataPropagator will replace the 18th character with the before-image column identifier in target tables, so you must ensure that the first 17 characters of the name are unique.
Currently, DB2 DataPropagator cannot replicate certain types of data. The major restrictions are:
Allocate sufficient disk space for the active log because Capture for VSE and Capture for VM cannot read archived logs.
DB2 DataPropagator can replicate data that is compressed through DB2 software or hardware compression on DB2 for OS/390 V4 (or higher) if the dictionary used to compress the data is available. Before issuing REORG for compressed replication sources, you must either:
DB2 DataPropagator cannot replicate data that is compressed using EDITPROCs or FIELDPROCs.
DB2 DataPropagator cannot capture updates made by any of the database utilities. DB2 DataPropagator also cannot capture updates to data loaded with the LOAD RESUME LOG YES options.
DB2 DataPropagator cannot replicate data that is encrypted.
DB2 DataPropagator cannot replicate the following data types under any circumstances:
DB2 DataPropagator can replicate the following data types under certain circumstances:
User-defined data types (distinct data types in DB2 Universal Database) are converted to the base data type before replication.
You must have one DB2 DataJoiner database for each non-IBM source server. Although you can use one DB2 DataJoiner database for replicating data to multiple non-IBM target servers, you need a unique DB2 DataJoiner database for each non-IBM source server. The reason for this restriction follows:
For every replication scenario, there is a set of control tables, each with names that cannot be changed. When replicating to non-IBM target servers, none of these control tables needs to be located in the non-IBM database because a DB2 DataJoiner database is the target for the Apply program. The nicknames used here refer to the target table and not to any of the control tables.
For non-IBM source servers, however, some of the control tables must be located in the non-IBM database so the Capture triggers can update them. Because of this location requirement, the DB2 DataJoiner nicknames associated with those control tables must be the actual control table names, and their schema must be ASN. Because a DB2 DataJoiner database cannot contain more than two identical nicknames with identical schemas, one DB2 DataJoiner database must be used for each non-IBM source server. You can, however, support multiple non-IBM source servers within one DB2 DataJoiner instance by creating multiple DB2 DataJoiner databases within that one DB2 DataJoiner instance.
For DataPropagator for Microsoft Jet, row-replica tables are considered sources rather than targets, and so you do not need to define nicknames for Microsoft Jet or Microsoft Access tables in DB2 DataJoiner. See Mobile replication using DB2 DataPropagator for Microsoft Jet.