*************************************************** * Data Replication Solution Questions and Answers * October 4, 1994 *************************************************** Q: CAN YOU TELL ME WHAT YOU MEAN WHEN YOU USE THE TERM 'DATA REPLICATION'? A: I mean controlled, planned replication (or copying) of enterprise data and management of the data copies. Typically (though not necessarily), Data Replication brings the data to where applications/users access it locally. Data Replication is contrasted with 'Direct Data Access', where the application/user accesses the data directly, not a copy of that data. Typically in this case, at least part of the data being accessed is remote. Data Replication is a function normally provided by a company's IS organization. The term encompasses all types of replication, be it synchronous or asynchronous, continuous or at defined intervals, complete refresh or update of changes only, for operational or informational purposes, from a source to a target. It often involves some form of enhancement of the data that is being replicated. Q: WHAT, IN BRIEF, ARE THE COMPONENTS OF DATA REPLICATION? A: The components of Data Replication are: o The replication tools themselves o A workstation-based control point for data administration and replication invocation o A work-flow manager for execution of multi-step replication processes o Open product architecture and interfaces to support interoperability between tools as well as integration with customers' own applications. Q: WHAT IS THE IBM DATA REPLICATION SOLUTION? WHICH PRODUCTS FORM THE DATA REPLICATION SOLUTION? HOW DO THEY CONTRIBUTE? A: Before talking about the individual products, let me first cover what the whole solution delivers. IBM's Data Replication Solution provides an end to end solution for replicating data from legacy sources to host-based and client/server relational database targets. The ability to replicate data from non-relational legacy sources is a key strength of ours. When you consider that industry consultants estimate that 80% or more of the world's data is in non-relational formats, you realize just how important this is. IBM's replication architecture is built to support both operational and informational requirements. While the customization of data for its eventual use is important both in operational and decision support environments, the data currency and data enhancement requirements vary. To address data currency requirements, we provide both refresh and update processing, with the ability to keep data synchronized at time intervals varying from seconds to weeks. To address the data enhancement requirements, IBM's Data Replication Solution provides the ability to use SQL to subset, derive and aggregate data. IBM's complete Data Replication Solution can be implemented with no changes or recompilation of existing applications. The solution is also built with ease of data administration in mind. We provide a single, graphical user interface, control point for replication administration. The same control point can also be used for complete database systems management. Easy to use and intuitive, the interface guides the user or DBA through the process of mapping sources to targets, creating SQL selections, and defining data enhancement. Finally, in carrying out the replication, target databases are automatically created and loaded as required. Now to the individual products. The key Data Replication components delivered are as follows: o The replication tools themselves - DataRefresher - DataPropagator NonRelational - DataPropagator Relational - The 'Copy Data' function of the DataHub Tools Feature o Workstation-based control point: DataHub/2 o Work-flow manager: FlowMark for OS/2 o Open architecture and interfaces: DataPropagator Relational has an open architecture that facilitates tool interoperability. It's DRDA-based design allows it to interoperate with any DRDA application server. DataRefresher has added generic programming interfaces that allow addition of sources and targets beyond those supported directly by DataRefresher. Here are some highlights of our October 4, 1994 announcements: o DataHub participates in Data Replication in two ways. Firstly, it is IBM's strategic platform for the administration of databases, including administration of replication functions -- not only those provided by IBM but also those provided by other vendors. Secondly, the DataHub Tools feature includes a 'Copy Data' function that copies data between any of the supported databases, namely, all IBM's relational databases (with the exception of DB2 for VSE). In addition to replicating data, the user authorizations for the data can also be copied. DataHub's Copy Data function supports refreshes only. It has the unique and powerful capability of identifying important differences between databases, reporting on the implications of a replication request where, for example, a data type in the source DBMS is not supported in the target DBMS. Copy Data is an ideal migration tool for setting up client/server databases from centralized databases. In addition to its suitability for setting up and testing new client/server databases, DataHub's Copy Data function is appropriate for ad hoc replication processing or infrequent replication. Where updates/change history are required, or for high frequency replication, DataPropagator Relational is recommended for the databases it supports. o DataRefresher is the replacement product for Data Extract (DXT). In addition to new programming interfaces, DataRefresher has a new graphical user interface conforming to CUA-91. This interface dramatically reduces the complexity of administering DataRefresher, and addresses the highest priority customer requirement for the product. In keeping with our goal of providing an integrated Data Replication Solution, this new graphical user interface is enabled to DataHub. As with DataHub's Copy Data, DataRefresher supports only refresh replication. Although it supports DB2 for MVS and VM as replication sources, its main focus is to build informational databases for the DB2 family from legacy sources, including IMS, VSAM and sequential files. Its generic data input and output interfaces permit the extension of the sources and targets beyond those directly supported by the product itself. DataRefresher can deliver target data in a load utility ready form for any DB2 family database. DataRefresher also supports the DataPropagator Relational data staging area, and can therefore make legacy data available to DataPropagator Relational supported targets. o DataPropagator NonRelational (formerly known as Data Propagator MVS/ESA), supports the update of DB2 from IMS and of IMS from DB2, both in synchronous mode, supporting application migration and coexistence in these two environments. It also supports asynchronous batched update from IMS (under IMS/TM and CICS DBCTL) to DB2 for MVS. The DataRefresher graphical user interface can be used for specifying DataPropagator NonRelational propagation requests. With DataRefresher's enablement to DataHub this brings DataPropagator NonRelational administration to the same control point as the other replication products. DataPropagator NonRelational and DataPropagator Relational together support the update of relational databases at the desktop from IMS databases -- a powerful client/server enabler for customers with IMS. o DataPropagator Relational is a new product that is a vital part of our Data Replication Solution within the relational environment. It provides a powerful replication and data enhancement capability between DB2 family databases and DRDA application servers. It provides both refresh and update replication from DB2 sources on MVS, OS/400, AIX, and OS/2 to any DB2 database at user-specified intervals -- it can even choose whether to refresh or update for performance and efficiency. Full refresh is available between any DB2 database or DRDA application server. DataPropagator Relational's administrative interface is built on the DataHub/2 platform. DataPropagator Relational captures changes to source tables in one or more staging tables for subsequent propagation to target tables. This not only enables maintenance of complete histories in support of informational applications, but also provides flexible and efficient distribution of consistent copies across heterogeneous platforms. The product's open architecture and interfaces support customer tailoring and tool interoperability. DataRefresher, DataPropagator NonRelational and DataPropagator Relational work together through these interfaces to provide updates of IMS data to the desktop. IBM DataJoiner, a new multi-server database product, extends the reach of IBM's Data Replication Solution by replicating data to Sybase and Oracle targets. Providing powerful heterogeneous, distributed join and update capabilities, DataJoiner provides a common SQL interface to mixed environments. These products complement each other to support customer demands for open database solutions. Q: WHAT IS THE VALUE OF FOCUSING ON DATA REPLICATION AS A SOLUTION OVER AND ABOVE THE INDIVIDUAL PRODUCTS? A: Data Replication is the key to changing data into information and delivering that information to the business people who need it. It can also be fundamental to the success of companies' downsizing activities and client/server strategies. Individually, each Data Replication product IBM is delivering provides valuable replication function in its own right. However, it is in combination that our solution really has power: Working together, DataRefresher, DataPropagator NonRelational and DataPropagator Relational allow you to bring DB2 for MVS or VM, IMS, VSAM, flat file and other source data to the DB2 desktop databases in refresh mode. Furthermore, updates to DB2 and IMS data can also be delivered to the desktop. All this is achieved transparently to the applications updating or accessing the data. DataHub/2 provides a single control point from where the replication processes can be administered in a consistent manner. Q: HOW DOES DATA REPLICATION SUPPORT CUSTOMERS WHO WISH TO IMPLEMENT CLIENT/SERVER APPLICATIONS AND SOLUTIONS? A: One of the biggest inhibitors to implementing successful client/server applications is the increased data management demands it places on an organization. It is not practical from any perspective -- performance, security, availability, you name it -- to retain only a single centralized repository of data as the applications and their users become more and more distributed across diverse platforms and locations with different operating environments and software. While direct data access is a powerful enabler for some client/server applications, the organization can run into capacity, performance and availability problems as the number of client/server applications and end user demand grows. IBM's Data Replication Solution has the ability to remove this roadblock by efficiently managing the delivery of information to the increased numbers of servers and the applications which run on them. Q: WHAT IS THE RELATIONSHIP OF DATA REPLICATION TO INFORMATION WAREHOUSE? A: Information Warehouse Architecture I describes the initial emphasis of IBM's Information Warehouse family. The copy management architecture presented in this document is implemented via the IBM Data Replication Solution. The data staging area described in the document is implemented by DataPropagator Relational's data staging area. In Information Warehouse Architecture I, copy management is positioned as the delivery of operational data to business professionals by replicating and enhancing the operational data. We call this enhanced data 'informational data'. Business professionals access this informational data for decision support purposes using informational applications. Data Replication is an alternative to decision support applications accessing operational data directly. The replication tools which make up our Data Replication Solution, however, go beyond the uses described in Information Warehouse Architecture I and address customer requirements for the replication of data for both operational and informational applications. Q: WHY WOULD I WANT TO USE DATA REPLICATION FUNCTIONS WHEN I CAN GET DIRECT ACCESS TO THE DATA I NEED EVEN IF THAT DATA IS REMOTE? A: Data Replication itself almost certainly uses direct access. It does so, however, in a controlled, predictable manner, protecting operational systems from unpredictable query workloads. Data Replication can also provide stable views of constantly changing data, allowing a consistent base for query/analysis of database information. In addition, Data Replication allows enhancement of data, and accumulation of historical data. Data enhancement is important to transform encoded data into usable business information. Historical data is important for supporting trend analysis. By downloading the data to where it is needed, Data Replication supports local database access. This improves access performance. It also increases site autonomy. In other words, data can be accessed on local servers even where access to the central databases is not possible. Q: WHAT ARE YOUR PLANS TO SUPPORT REPLICATION FROM AND TO NON-IBM DATABASES? A: We plan to extend our Data Replication Solution to support replication to and from non-IBM databases. This is a major customer requirement. We are addressing it in two ways: 1. We are delivering an open, extensible solution, with key product interfaces published to allow other tools and other DBMSes to participate: DataPropagator Relational's architecture is based on SQL staging tables and open SQL access, so is designed for easy portability across SQL databases. DataRefresher's Generic Data Input exit and Generic Output Interface support addition of sources and targets beyond those directly supported by DataRefresher. 2. IBM anticipates extending our products to support non-IBM sources and/or targets. Clearly this support will be delivered over time, and speed of delivery will depend on the underlying infrastructure (such as the vendor DBMS's implementation of DRDA) as well as the critical nature of our customers' requirements. IBM intends to provide replication support for its own databases first. Q: DOES YOUR SOLUTION PROVIDE TRUE REPLICATION? A: Yes. Replication is a much-used term. It is used loosely in the trade press to describe asynchronous propagation of changes to read-only copies. The more accurate term for this is 'continuous asynchronous change propagation'. It is defined very precisely by the database research community as a method of keeping copies of a table at multiple locations synchronized, where any of the copies (or replicas) may be updated, and all other copies are automatically synchronized with the updated copy. True replication can only be guaranteed in a synchronous environment. DataPropagator NonRelational supports true synchronous replication of data between IMS and DB2. While not part of Data Replication, DB2 Version 3's implementation of Distributed Unit of Work allows applications to maintain synchronized DB2 databases across MVS systems. DataPropagator Relational supports asynchronous propagation of changes to read-only copies. DataPropagator Relational is designed to support 'point-in-time' snapshots and histories of data at customer-defined intervals. Q: ARE VENDORS WHO HAVE REPLICATION TOOLS PARTICIPATING IN THIS DATA REPLICATION SOLUTION? A: Yes. We have already run a Data Replication workshop at which a number of vendors participated. Vendors were encouraged to enable their replication tools to DataHub and already several of them have indicated their intention to participate. Q: WHAT MUST VENDORS DO TO PARTICIPATE? A: There are many ways in which vendors can participate, including: enabling their replication tools to DataHub, working with FlowMark for OS/2 and producing products that interoperate with our replication tools. For information on participation please contact Joe Varuola at IBM Santa Teresa Laboratory, telephone 408/463-3228. Q: IS DATAHUB A PREREQUISITE FOR DATA REPLICATION? A: Data Replication's administration is built on the DataHub/2 control point. The DataHub Support components are required only for DataHub's own 'Copy Data' function. The DataHub/2 product is required as an administrative platform for DataPropagator Relational and DataHub's 'Copy Data'. Although DataRefresher does not require the DataHub/2 product, it is enabled to the DataHub/2 platform. The same is true of DataPropagator NonRelational, which can leverage the DataRefresher graphical user interface. Using DataHub as a control point for all your data replication activities will help make your DBAs more productive. Q: IS FLOWMARK FOR OS/2 REQUIRED FOR DATA REPLICATION? A: No, FlowMark for OS/2 is optional. FlowMark for OS/2 is valuable to allow a complex replication process (involving potentially multiple replication tools and multiple systems) to be defined once and scheduled to run periodically in unattended mode. It is not, however, a prerequisite for any of the Data Replication products. Today, only the DataHub Tools feature's 'Copy Data' function can interact with FlowMark for OS/2. FlowMark for OS/2 is an exciting product and we will be evaluating which components of Data Replication will be most suited for enablement. Q: ISN'T THIS REPLICATION STUFF JUST ANOTHER NICE WAY FOR IBM TO SELL DASD? A: This solution supports the management and control of data replication. The only way to save on DASD is to have one copy of the data. In other words, to make the operational systems data available for end user access. Providing end users with access to operational data is no real solution: o Operational systems are open to unpredictable query workloads o The data which is being accessed is constantly being updated o Users can access only raw operational data o No access to historical data, no ability to perform trend analysis o Users need multiple gateways and tools to access data o No consistency in data from one operational application to another The only realistic way to support end user access to data is to provide a copy containing consistent, enhanced and stable data. This will require additional disk space. However, if the replication is controlled and managed, the benefit to the business will more than offset the costs involved. In a client/server environment, this disk space is at a greatly reduced cost. The IBM Data Replication Solution provides an integrated solution for managing this requirement from a single control point. The solution provides the ability to limit the data replicated to that which is required by the business. Q: ISN'T THIS PRODUCT SET OVERLY COMPLEX? WHY DO YOU NEED FOUR PRODUCTS? A: We believe we are offering a flexible, modular solution. Each component provides significant function on its own and together they complement each other to provide industry leading heterogeneous replication. Q: HOW DOES THE IBM DATA REPLICATION SOLUTION INTEGRATE WITH DATAGUIDE? CAN I ACCESS THE REPLICATION MAPPING AND ENHANCEMENT INFORMATION FROM DATAGUIDE? A: DataGuide is designed as an information catalog for end-users. The information in DataGuide is defined by the DataGuide Administrator. The Administrator can extract data from the replication meta-data or define queries which would access the meta-data for the end-user. However, the replication meta-data is designed for use by Database Administrators and would most likely need to be enhanced by the DataGuide Administrator to be of significant value to an end-user. IBM (c) International Business Machines Corporation 1994 All Rights Reserved References in this publication to IBM products or services do not imply that IBM intends to make them available outside the United States. IBM, DataRefresher, DataPropagator, DataHub, FlowMark, OS/2, DRDA, DB2, DXT, CUA, MVS/ESA, CICS, OS/400, AIX, DataJoiner, Information Warehouse and DataGuide are trademarks or registered trademarks of the International Business Machines Corporation. All other products are trademarks or registered trademarks of their respective companies.