Replication Guide and Reference

Data replication configurations

This chapter describes the typical data replication configurations and provides examples of replication solutions for common business needs. Some of these configurations show how other products can be used with DB2 DataPropagator to create a unique replication solution. The replication configurations covered here are not exhaustive because users are continually developing new and creative implementations.

Important: DB2 replication is designed for asynchronous replication and is not suitable for the following situations:

Performing real-time replication: In real-time replication, sometimes called synchronous replication, changes to the source system are made immediately to the target tables. If synchronous data delivery is essential to your application, code your application to update the application table and all of its copies with a single transaction to ensure that both the source and target are changed at the same time.
Maintaining a backup server (hot-site backup): Do not use asynchronous replication to maintain a backup server that can be accessed when your primary server is down. If the source (primary) server becomes unavailable, there is no way to guarantee that all updates were made to the target (backup) server. If you must maintain a backup server, consider using other tools or features. For example, for System/390 consider Peer-to-peer remote copy (PPRC) hardware feature or the extended recovery component (XRC).

Overview of replication configurations

You can combine configurations to suit your business needs. The following sections describe these typical configurations, including some variations of each:

Data distribution
Data consolidation
Update anywhere
Occasionally connected

Data distribution and data consolidation are easier to set up and maintain than the other configurations.

Data distribution

In data distribution configurations, a primary data source resides on a source server (see Figure 2). Changes made to the data source are replicated to one or more target tables that reside anywhere in a distributed network. The target tables are read-only; therefore, you don't need to set up conflict detection because no update conflicts occur during replication. Applications can use the target tables, which are local copies, so that they don't overload the network or central server. This configuration is useful if you need to share data among several sites but you don't want to reduce the performance of your applications.

Figure 2. Data distribution. Changes made to a source table are replicated to read-only target tables.

Data consolidation

In data consolidation configurations, a central data server is used as a repository for data from many data sources (see Figure 3). Therefore, this configuration consists of many source tables or views and one target table with multiple subset views. Changes made to each data source are replicated to the central data server, which is read-only.

Restriction: If you consolidate data from more than one server into a CCD target table, you must not use that CCD target table as a replication source for other target tables. The original servers use separate log sequences that cannot be distinguished in further replication.

Data consolidation configurations are useful for maintaining a local decision support system (DSS) so that you can analyze data without competing for production database resources. To ensure that there are no update conflicts, you must design the replication environment so that there is only one source for each data item. If each source updates a unique set of rows, you will never encounter update conflicts.

Figure 3. Data consolidation. Each source table can update a unique set of rows in a read-only target table.

Update anywhere

In update-anywhere configurations, a replication source has target tables that are read/write copies. Changes made to a target table are applied to the source table, which maintains the most up-to-date data. If a conflict occurs between a source and target, the source wins. The source table then applies the changes to all of its target tables. Unless you design your application correctly, update conflicts can occur when the data is replicated (see Figure 4). It is best to design your application so that a conflict can never occur when data is replicated from the source to all the target tables (see Figure 5). You have the option of ignoring conflicts and rejecting any conflicting updates. By rejecting conflicting updates, you risk losing some information.

Figure 4. Update-anywhere replication with risk of conflicts between target tables. This configuration requires conflict detection because all rows can be updated at the source table or at any target table.

Figure 5. Update-anywhere replication with no risk of conflicts between target tables. Each read/write target table has a unique set of rows that can be updated locally; the source table at the source server maintains the most up-to-date data.

Occasionally connected

In occasionally connected configurations, you have the flexibility to connect to and transfer data to and from a primary source on demand. These types of configurations allow users to connect to the primary data source only long enough to synchronize their local database. The data source doesn't require a continuous connection for replication administration (see Figure 6).

Occasionally connected configurations are well-suited for synchronizing data on laptop computers or at computers in home offices. These configurations minimize the frequency and duration of communication-line connections and reduce telecommunication costs, yet the data is available for update. This type of configuration also works well for replicating data to onsite computers that are not constantly connected to the network (for example, if employees are in the office only three days a week).

Figure 6. Occasionally connected configuration. The target servers are not continuously connected to the source server; changes that are made to the tables are replicated when the target server is connected to the source server.

You can use DB2 Universal Database Satellite Edition (or any other DB2 server that participates in a satellite environment) to administer satellites, which are occasionally connected DB2 servers. DB2 data replication enables you to synchronize data between a central control site and many satellites. At the home office you set up the replication environment, test it, and when it is ready to be rolled out to the occasionally connected systems, you store it in the Satellite Administration Center database. You don't access any of the occasionally connected systems and only need to set up the environment once.

For information about setting up data replication for satellites, enabling the satellite environment for replication, and testing replication on a satellite, see the DB2 Universal Database Administering Satellites Guide and Reference.

Examples of replication configurations

You can build on the typical replication configurations to come up with replication models that meet your specific needs. This section discusses examples of some common business needs and the DB2 replication solutions that address those needs. Design issues that are unique to each replication solution are also described.

Archiving audit information

Requirements: A customer in a DB2-IMS Transaction Manager (TM) environment generates audit data by writing audit information to the IMS log. New applications access DB2 through DRDA, bypassing IMS TM completely. The customer needs to track all changes to relational tables for auditing purposes to determine which users made particular changes to the data.

Replication solution: The Capture and Apply programs for DB2 DataPropagator are used to capture and store the DB2 for OS/390 changes in target tables (see Figure 7).

Figure 7. Audit information. Audit data is replicated to a target table that can be read by the customer's application.

Design highlights: Both the before-image and the after-image values of each row are captured and stored. The authorization ID of the user who changed the data is also stored in the audit tables. All of this information is captured from the DB2 for OS/390 log.

Consolidating data from distributed databases

Requirements: A large retail chain has almost 500 stores around the country, each of which gathers purchase details through an electronic point of sale (EPOS) system. Each store keeps its data in local databases on DB2 for AIX. The data is transferred nightly to a central DB2 for OS/390 site using a pre-existing file-transfer process from the EPOS terminals. The company wants to enhance the data at the central site.

Replication solution: Data changes from each retail store are captured and saved by the Capture program on DB2 for AIX (see Figure 8). The Apply program on DB2 for OS/390 consolidates the data from all stores and summarizes it.

Figure 8. Consolidating data from distributed databases. Data from three source servers is replicated to two target tables on a target server.

Design highlights: The Apply program uses base aggregate and change aggregate tables to summarize the consolidated store data. The base aggregate tables summarize the contents of the source files. The change aggregate tables summarize the results of the changes made between each refresh of the target that is performed by the Apply program.

Distributing data to remote sites

Requirements: A small bank installed several new Windows NT client/server applications in its 85 branches. A major source of data for the new applications is the customer and financial reference data, which is derived and held at a host site in two operational systems, one on DB2 for OS/390 and the other on DB2 for AIX. If branches accessed the data directly from the host site, network traffic would be congested and the availability of the production data could be affected.

Replication solution: To minimize the network traffic, a local copy of the database is maintained at each branch (see Figure 9). Therefore, each branch is a target server. Changes are captured from DB2 for OS/390 and DB2 for AIX, condensed in control tables on DB2 for AIX, and replicated to the branches overnight.

Figure 9. Distributing data to remote sites. Source data is consolidated on an AIX server and replicated to the branches. Each branch gets all of the financial data and some of the customer data. WHERE clauses are used to ensure that each branch gets the records that pertain to their own customers only.

Design highlights: One Apply program resides on AIX and replicates from DB2 for OS/390 and DB2 for AIX. There is one subscription set for replicating from DB2 for OS/390 to DB2 for AIX and one for replicating from DB2 for AIX to DB2 for AIX.

An Apply program also resides on the target servers at each branch. The Apply program on the source server runs separately from the Apply programs at the target servers. The Apply program at each of the branches replicates from the control tables on DB2 for AIX at the host site. Each of the Apply programs on the target servers has a subscription set for replicating from the host site to its local database. Each branch gets all of the financial data but only some of the customer data. WHERE clauses are used to ensure that each branch gets the records that pertain to their own customers only.

The Capture and Apply programs maintain complete, condensed CCD tables in DB2 for AIX. The administrator chose a condensed CCD table because that type of staging table contains only the most recent change made to a row, so network traffic is reduced during replication.

When the subscription sets were created for each branch, the administrator put the control server on the Windows NT server. If the administrator had put the control server on DB2 for AIX, the Apply program from each Windows NT server would need to connect to the host site over the network to read and update the control information about the subscription set, and to detect changes to its control information.

Distributing IMS data to remote sites

Requirements: A large financial institution wants to improve the flow of information from two legacy operational systems to its OS/2-based branches. It wants to provide more accurate and timely data to help loan-application research and to detect credit-card fraud. The data for loan applications is in DB2 for OS/390, and the credit card details are in an IMS system. Previous attempts to copy the legacy data consisted of an unworkable mixture of ad-hoc reports and file transfer techniques.

Replication solution: IMS DataPropagator is used to capture and save the changes to IMS data into CCD tables in DB2 for OS/390 (see Figure 10). The Capture program is used to capture and save the changes to DB2 for OS/390 data. The data that is saved is historical--it records every change made. The Apply program runs at the branches and uses the historical data from IMS and DB2 for OS/390 to maintain DB2 for OS/2 tables.

Figure 10. Distributing IMS data to relational databases. IMS DataPropagator replicates IMS data to target tables on an OS/390 source server. DB2 DataPropagator captures data from the OS/390 source server and replicates it to OS/2 servers.

Design highlights: IMS DataPropagator captures changes from the IMS log and creates a noncondensed CCD table in DB2 DataPropagator format on the OS/390 source server. DB2 DataPropagator uses this CCD table as a replication source. The Capture program on the OS/390 server captures information from the local tables that contain the credit-card and loan-application data. The Apply program on the OS/2 target server pulls the change data to the target tables.

Accessing data continuously

Requirements: An international bank wants to keep its system online 24 hours a day. Currently the system is online 23 hours 45 minutes a day. Every day the bank stops the system to quiesce it for a batch application, which requires exactly one day's worth of data. During the 15 minutes when the system is down, the required tables are extracted. After the extraction, the system is made available for the next financial day.

Replication solution: Data changes made during the day are captured and replicated to CCD tables (see Figure 11). The batch application was modified to process the changes in the CCD tables instead of the table extracts. The online system does not need to be stopped to provide consistent data for the batch application.

Figure 11. Batch application using replicated data. The source data is replicated to a CCD table. The batch application extracts data from the CCD table when the source table is unavailable.

Design highlights: The CCD table includes a timestamp that is used to identify the changes made during a time period (in this case, one day).

Replicating operational data to decision support systems

Requirements: A financial institution needs to replicate updates from its customer information database on DB2 for AS/400 to a decision support system that is also on DB2 for AS/400. Historical data about updates must be saved and stored with no code changes to production applications and no impact to the performance of those applications.

Replication solution: Updates are captured from the key operational tables and, on an hourly basis, replicated to CCD tables in the decision support system (see Figure 12).

Figure 12. Replicating operational data to decision support systems. The noncondensed CCD target table is used to record all changes made to the source database.

Design highlights: The Capture and Apply programs maintain noncomplete, noncondensed CCD tables. Noncondensed CCD tables are used because they record all changes that are made to the customer information database. Furthermore, noncomplete CCD tables are used because the financial institution does not want to record the original contents of the source, it wants only the changes.

The Capture and Apply programs are given job priorities such that replication does not impact production CPU resources. The decision support system could be implemented just as easily on any of the supported target platforms and could still be ported to other platforms if required.

Using target tables as sources of updates (update anywhere)

Requirements: A financial institution has hundreds of agents at several branches who must fill in online forms to set up and modify client accounts. The agents base the quotation rates on information that was generated at the head office and sent to the branch. The agents send reports back to the head office, and the accounts are finalized only after the information is verified at the head office. The agents would be more productive if they had access to up-to-date data without the network problems of accessing the central database directly.

Replication solution: A special type of target table, called a replica, is used to set up circular subscriptions (see Figure 13). Changes to the replica are replicated back to the primary replication source, which is a user table. An update that is made at one location is reflected in the databases at other locations. Agents have the current information that they need to finalize accounts while meeting with the client, and the head office has the new business data generated that day.

Figure 13. Update-anywhere replication. The primary data source, or parent replica, is on an OS/390 server; and the dependent replicas are on Windows NT client systems.

Design highlights: The primary replication source is a user table. It contains the most up-to-date information.

This type of replication works best when transaction conflicts between the central database and the updatable copies can be avoided, such as when copies can update only key ranges at specific sites, or when sites can make updates only during certain time periods.

DB2 DataPropagator detects conflicts that occur when the same row is updated on the host system and on an agent's system and neither change has been replicated. If an agent made updates that are in conflict, these updates are discarded during replication to ensure data integrity. The transaction containing the conflict and all captured transactions that are found that are dependent on the conflicting transaction are backed out.

Updating data on occasionally connected systems

Requirements: An insurance company wants to equip its sales agents, who rarely visit the company's home office, with a set of offers to attract both new and existing customers--special introductory offers and personalized packages. Much of the time, the agents' computers will not be connected to the home office. When they connect to the home office, they need to get any updated information from the central database. Managing the potential backlog of changes can be an issue.

Replication solution: The sales force is supplied with laptop computers running DB2 Universal Database Satellite Edition. As a sales campaign is launched, each agent downloads the customer profiles and history, as well as the latest product offers. DB2 replication also solves the problem of keeping the information up to date. Only new and changed data rows are copied across the network.

Design highlights: DB2 Universal Database Satellite Edition is used because it meets the replication requirements and can be administered by a central administrator. An administrator at the home office sets up the replication environment, tests it, and copies it to the occasionally connected systems. The administrator also provides user IDs and passwords to the agents in the field so that they can connect to the server at the home office from their laptop computers. While they are logged on, the agents can synchronize the information on their laptop computer with the information at the source server by pressing a button.

Retrieving data from a non-DB2 distributed data store

Requirements: A manufacturing company uses an Oracle application to process customer orders, and DB2 on OS/390 for its central operational data store. The new order information is uploaded into DB2 in overnight batch extracts. The company wants the data to be replicated sooner because customers want their orders to be processed faster.

Replication solution: Triggers on the Oracle tables simulate the Capture program by placing the changed records into CCD tables at the Oracle server. Nicknames in DataJoiner make the Oracle source tables and CCD tables appear to be tables in a DB2 database so that the Apply program on OS/390 can replicate them to the DB2 for OS/390 tables. The Apply program is set to replicate every hour during the business day.

Figure 14. Retrieving data from a non-DB2 distributed data store. Triggers are used to capture changes made to source tables in Oracle, and DataJoiner is used to replicate to target tables on DB2 for OS/390.

Design highlights: DataJoiner Replication Administration (DJRA) is used to define the Capture triggers and the CCD tables in the Oracle database. DJRA also generates SQL statements to create all database objects and data type mappings. DataJoiner lets the Apply program access non-IBM data as if it were in DB2. The Apply program can also be run on DB2 for OS/390 to pull data.

Replicating operational data to a non-DB2 reports and query database

Requirements: A large retail chain has their business operations applications on a mainframe that uses a DB2 for OS/390 subsystem. Staff and headquarters personnel need to query the operational data to create reports. The retail chain wants to replicate the data needed for the queries and reports to an Informix database management system on a UNIX server. The retail chain needs the reports and query results to be based on data that is less than four hours old.

Replication solution: The Capture program places the updates that are made to the operational data in DB2 for OS/390 tables. The subscription timing interval is set to four hours to make sure that the query results and reports are based on current operational data. Using DataJoiner nicknames, the Apply program replicates updates from the DB2 tables to the query and reports tables in the Informix database.

Figure 15. Replicating operational data to a non-DB2 reports and query database example. Changes made to source tables on DB2 for OS/390 are captured and replicated to target tables on Informix using nicknames defined in DataJoiner.

Design highlights: DataJoiner Replication Administration (DJRA) is used to create the target tables in Informix with the correct Informix data types. The Apply program replicates data to Informix using DataJoiner nicknames and any data type transformations that are needed.

[ Top of Page | Previous Page | Next Page ]