Data Warehouse Center Administration Guide


Chapter 6. Moving data

The Data Warehouse Center provides several different ways to move data, depending on your needs:

In addition to the above methods, the DB2 Control Center has an import and export utility that you can use to move data. For more information, see the DB2 Universal Database Data Movement Utilities Guide and Reference.


Selecting and inserting data

You can use an SQL step to select source columns and insert the data from the columns into a target table. You can either specify that the Data Warehouse Center generate the target table based on the source data, or use the source data to update an existing table.

You can use a warehouse source or a warehouse target as a source for an SQL step. The Parameters page of the Step notebook will not be available for this step until you link it to a source in the Process Model window. You can also link this step to a target in the Process Model window. If you don't link the step to a target, you can specify that a table is created when the step runs.

You cannot change an SQL step that is in production mode.

Defining the step

To select and insert data:

  1. Open the step notebook.
  2. Specify general information about the program. For more information, see Providing general information about a step.
  3. On the SQL Statement page, create an SQL statement either by using SQL Assist or manually:

    To erase the text in the SQL statement field, click Clear. If you want to create new SQL, you must click Edit again.

  4. Optional: When you finish generating or editing your SQL statement, click Test to test the SQL query. The Data Warehouse Center returns sample results of your query.
  5. Optional: On the Column Mapping page, view or modify your column mappings. For more information, see Defining column mapping information.
  6. On the Processing Options page, in the Population type list, specify whether the step appends or replaces data in the target. For more information about population types, see Defining processing options.
  7. In the Agent Site list, select an agent site where you want your step to run. The selections in this list are agent sites that are common to the source tables and the target table.
  8. If you want to have the option to run your step at any time, select the Run on demand check box. Your step must be in test or production mode before you can run it.
  9. Optional: Select the Populate externally check box if the step is populated externally, meaning that it is started in some way other than by the Data Warehouse Center. The step does not require any other means of running in the Data Warehouse Center in order for you to change the mode to production.

    If the Populate externally check box is not selected, then the step must either have a schedule, be linked to a transient table that is input to another step, or be started by another program in order for you to change the mode to production.

  10. In the Retry area, specify how many times you want the step to run again if it needs to be retried and the amount of time that you want to pass before the next run of the step. For more information about the Retry area, see Defining processing options.
  11. In the Return Codes area, select the return code level that you want to receive if no rows are returned, or if an SQL warning occurs when your step is run.
  12. In the Incremental commit check box and the Every x rows field, specify whether you want the step to perform an incremental commit and, if so, how often you want the commit to occur. For more information about incremental commit, see Incremental commit.
  13. Click OK to save your changes and close the step notebook.

Incremental commit

Incremental commit is an option that is available for all SQL steps that allows you to control the commit scope of the data that is managed by the Data Warehouse Center. Incremental commit can be used when the volume of data to be moved by the agent is large enough that the DB2 log files might fill up before the entire work of the step is complete, or when you want to save partial data. SQL steps will complete with an error if the amount of data being moved exceeds the DB2 maximum log files that have been allocated.

The incremental commit option allows you to specify the number of rows (rounded to the nearest factor of 16) to be processed before a commit is performed. The agent selects and inserts data, committing incrementally, until it completes the data movement successfully. When the data movement completes successfully, outdated editions are removed (if the target has editions).

You should consider the following concerning incremental commit:


Loading and exporting data

You can use the supplied export programs, such as DB2 data export, to extract data from a DB2 database and write it to a flat file. You can use the supplied load programs, such as DB2 load replace, to extract data from a file and write it to another DB2 database.

Recommendation: Use these programs instead of the Data Warehouse Center's SQL processing when there is a substantial amount of data to load. Experiment to see which one is more efficient for you.

The bulk load and export programs operate on a delimited data file and a DB2 database. The database server does not need to reside on the agent site, but the source or target file must reside on the agent site. The number and order of the input and output fields must match.

These programs write log files in the directory that is specified by the VWS_LOGGING environment variable. The default value of VWS_LOGGING is x:\vwswin\logging\ on Windows NT and OS/2, and /var/IWH on UNIX, where x is the drive on which you installed the warehouse agent.

The value of the VWS_LOGGING environment variable is the default value of the Trace Log Directory field in the Configuration notebook. If you change the value of the Trace Log Directory field, the Data Warehouse Center writes the log files to the new directory that you specified, but the value of VWS_LOGGING does not change.

For a complete list of parameters, restrictions, and return codes for these programs, see "Steps and tasks" in the online help.

The sections that follow describe how to define the different step subtypes for the DB2 Universal Database warehouse programs.

The sections about the DB2 UDB export and DB2 UDB load warehouse programs describe how to define the basic values for these programs. Information about defining the advanced features for the DB2 Universal Database Insert and DB2 Universal Database load warehouse programs are described in the online help.

Exporting data

You can use the supplied warehouse programs to export data from a DB2 UDB database or a database that is defined in ODBC.

Defining values for a DB2 UDB export (VWPEXPT1) warehouse program

Use the Step Properties notebook for DB2 UDB export to create a step that can be used to export data from a DB2 Universal Database table or view to a file located at the agent site.

The database server does not need to be on the agent site. However, the target file must be on the agent site.

Specify the name of the target file as it is used on the agent site. DB2 UDB export creates the target file if it does not exist, and replaces it if it exists.

Requirement: The source table, or view, must be linked to the step in the Process Model window. The step must be linked to the warehouse target.

DB2 UDB export steps do not use the Column Mapping page.

To define values for a step that runs a DB2 Universal Database export warehouse program:

  1. Open the step notebook.
  2. Specify general information about the warehouse program. For more information, see Providing general information about a step.
  3. Optional: On the Parameters page, specify information for the export step:
  4. On the Processing Options page, provide information about how your step processes. For more information, see Defining processing options.
  5. Click OK to save your changes and close the step notebook.

Defining values for the Data export with ODBC to file (VWPEXPT2) warehouse program

Use the Data export with ODBC to file (VWPEXPT2) warehouse program to select data in a table that is contained in a database registered in ODBC, and write the data to a delimited file. To run this program on AIX or UNIX, use the ODBC version of the warehouse agent.

This step subtype uses a warehouse source or target file as a source. You connect a source to the step in the Process Model window. The output file is generated on the agent site.

This warehouse program is available for the following operating systems:

The Column Mapping page is not available for this step.

To define values for a step that runs this warehouse program:

  1. Open the step notebook.
  2. Specify general information about the warehouse program. For more information, see Providing general information about a step.
  3. On the Parameters page, in the Column delimiter field, click or type the character that you want to use as a column delimiter.
  4. Create or generate SQL either using SQL Assist or manually:

    Optional: If you want to erase the text in the SELECT statement field, click Clear. If you want to create new SQL, you must click Edit again.

  5. Optional: When you finish generating or editing your SQL, click Test to test the SQL query. The Data Warehouse Center returns sample results of your query.
  6. On the Processing Options page, provide information about how your step processes. For more information, see Defining processing options.
  7. Click OK to save your changes and close the step notebook.

Loading data

You can use the supplied warehouse programs to load data into a DB2 Universal Database, DB2 for AS/400, and DB2 for OS/390 database.

Defining values for a DB2 Universal Database load program

Use the DB2 Universal Database Load Step Properties notebook to create a step that loads data from a source or target file into a DB2 Universal Database table.

You can use a warehouse source or target as a source for this step subtype. Link the source to the step subtype in the Process Model window. Then, link the step subtype to a warehouse target.

The Column Mapping page is not available for this step.

To define values for a step that runs a DB2 Universal Database load warehouse program:

  1. Open the step notebook.
  2. Specify general information about the warehouse program. For more information, see Providing general information about a step.
  3. Optional: On the Parameters page do the following:
  4. On the Processing Options page, provide information about how your step processes. For more information, see Defining processing options.
  5. Click OK to save your changes and close the step notebook.

Defining values for a DB2 UDB for AS/400 Data Load Insert (VWPLOADI) program

Use the DB2 UDB for AS/400 Data Load Insert (VWPLOADI) program to load data from a flat file to a DB2 UDB for AS/400 table. The load operation appends new data to the end of existing data in the table.

Before you define this step subtype, you must connect the step to a warehouse source and a warehouse target in the Process Modeler.

Acceptable source files for the AS/400 implementation of VWPLOADI are AS/400 QSYS source file members or stream files in Integrated File System (IFS), the root file system.

Tip: You can improve both performance and storage use by using QSYS file members instead of stream files. CPYFRMIMPF makes a copy of the entire stream file to QRESTORE and then loads the copy into your table. See the online help for CPYFRMIMPF for more information.

Target tables can reside only in the QSYS file system.

You can make changes to the step only when the step is in development mode.

Before the program loads new data into the table, it exports the table to a backup file, which you can use for recovery.

The Column Mapping page is not available for this step.

Prerequisites

To use this program, you must have the following PTFs applied to the AS/400 agent site:

Command PTF:


5769SS100 VRM420 SF46911


Code PTF:


5769SS100 VRM420 SF46976


Maintenance PTF:


5769SS100 VRM420 SF49466

These PTFs provide the AS/400 CPYFRMIMPF and CPYTOIMPF commands (LOAD and EXPORT), which are required to run VWPLOADI program. These PTFs also install the online help for these commands.

Security

The user profile under which this program and the warehouse agent run must have at least read/write authority to the table that is to be loaded.

Requirements

The following requirements apply to the VWPLOADI program. For information about the limitations of the CPYFRMIMPF command, see the restrictions section of the online help for the CPYFRMIMPF command. To view the online help for this command, type CPYFRMIMPF on the AS/400 command prompt, and press F1.

  1. The Data Warehouse Center definition for the agent site that is running the program must include a user ID and password. The database server does not need to be on the agent site. However, the source file must be on the database server. Specify the fully qualified name of the source files as defined on the DB2 server system.
  2. If the program detects a failure during processing, the table is emptied. If the load process generates warnings, the program returns as successfully completed.
  3. The default behavior for VWPLOADI is to tolerate all recoverable data errors during LOAD (ERRLVL(*NOMAX)).

    To override this behavior, include the ERRLVL(n) keyword in the fileMod string parameter, where n = the number of permitted recoverable errors.

    You can find more information about the ERRLVL keyword in the online help for the CPYFRMIMPF command.

To define values for a step that uses this program:

  1. Open the step notebook.
  2. Specify general information about the program. For more information, see Providing general information about a step.
  3. Click the Parameters tab.
  4. Optional: To specify the delimiters of your load program, click Specify only the delimiters of the MODSTRING. Otherwise, the default entries in the Record, Column, Character strings, and Decimal point fields are used.

    Specify the delimiters for your load program:

  5. Optional: To add parameters to the program, click Type Modstring parameters. Type your parameters in the field under the radio button.

    This field is used to modify the file characteristics that the CPYFRMIMPF command expects the input file to have. If this parameter is omitted, all of the default values that the CPYFRMIMPF command expects are assumed to be correct.

    Some of the default characteristics of the input file are:

    For more information on the default values for the CPYFRMIMPF command, see the AS/400 online help for the CPYFRMIMPF command.

    The format for the FileMod string is:

    1. The string must contain valid CPYFRMIMPF command keywords. All of the valid keywords for the CPYFRMIMPF command are described in the online help for the command.
    2. Each keyword must be followed immediately by its value. The value must be enclosed in parentheses.
    3. Each keyword must be separated from the next keyword by a space.

    Requirement: Certain parameters require that you enclose values in two single quotation marks. For example, the FLDDLM command must have the values enclosed by two single quotation marks. The Data Warehouse Center generates the AS/400 CALL statement for VWPLOADI in the form:

    CALL PGM(QIWH/VWPLOADI) PARM('fromfile' 'totable' 'filemodstring')
    

    Two single quotation marks together tell the AS/400 command prompt processor that your parameter value contains a single quotation mark. This prevents the command line processor from confusing a single quotation mark with the regular end-of-parameter marker.

  6. On the Processing Options page, provide information about how your step processes. For more information, see Defining processing options.
  7. Click OK to save your changes and close the step notebook.

Traces and diagnostic information

The VWPLOADI program provides two kinds of diagnostic information:

Important: Successful completion of this program does not guarantee that the data was transferred correctly. For stricter error handling, use the ERRLVL parameter.

Reading the VWPLOADI trace file

The VWPLOADI trace files are located in the Integrated File System in the /QIBM/UserData/IWH directory.

The VWPLOADI trace file has the following name format:

VWxxxxxxxx.VWPLOADI

where xxxxxxxx is the process ID of the VWPLOADI run that produced the file.

To view trace files from a workstation:

  1. Use Client Access/400 to map your AS/400 root file system to a logical drive, or use FTP to copy the file to the workstation.
  2. Open the trace file with a text editor to view the information.

Viewing the VWPLOADI trace with Client Access/400

To use Client Access/400 to map an AS/400 system to a logical drive on an NT workstation:

  1. Set up a Client Access/400 connection to your AS/400 system over TCP/IP.
  2. Open the Windows NT File Explorer.
  3. From the Explorer menu, click Tools --> Map Network Drive.
  4. Type the path name:
    \\hostname\.
    

    where hostname is the fully qualified TCP/IP host name of your AS/400 system.

  5. Click OK.

Requirement: If you use Client Access/400 to access the trace file, you must define the file extension .VWPLOADI to Client Access/400. Defining this extension allows Client Access/400 to translate the contents of files with this extension from EBCDIC to ASCII.

To define a file extension to Client Access/400:

  1. From Windows NT, select Start --> Programs --> IBM AS400 Client Access --> Client Access Properties.

    The Client Access notebook opens.

  2. Click the Network Drives tab.
  3. In the File extension: field, type .VWPLOADR.
  4. Click Add.
  5. Click Apply.
  6. Click OK.

You should now be able to load the file into any ASCII text editor or word processor.

AS/400 exceptions

If there was a failure of any of the system commands issued by VWPLOADI, then there will be an exception code recorded in the VWPLOADI trace file. To get an explanation for the exception:

  1. At an AS/400 command prompt, enter DSPMSGD RANGE(xxxxxxx), where xxxxxxx is the exception code. For example, you might enter DSPMSGD RANGE(CPF2817).

    The Display Formatted Message Text panel is displayed.

  2. Select option 30 to display all information. A message similar to the following message is displayed:
    Message ID . . . . . . . . . :   CPF2817
    Message file . . . . . . . . :   QCPFMSG
    Library  . . . . . . . . . :     QSYS
    Message . . . . :   Copy command ended because of error.
    Cause . . . . . :   An error occurred while the file was 
                        being copied.
    Recovery  . . . :   See the messages previously listed.  
                        Correct the errors, and then try the 
                        request again.
    

The second line in the VWPLOADR trace file contains the information that you need to issue the WRKJOB command.

To view the spool file, you can cut and paste the name of the message file to an AS/400 command prompt after the WRKJOB command and press Enter. View the spool file for the job to get additional information about any errors that you may have encountered.

Defining values for a DB2 UDB for AS/400 Data Load Replace (VWPLOADR) program

Use the DB2 UDB for AS/400 Data Load Replace (VWPLOADR) program to load data from a flat file to a DB2 UDB for AS/400 table. The load operation completely replaces existing data in the table.

Before you define this step subtype, you must connect the step to a warehouse source and a warehouse target in the Process Modeler.

Acceptable source files for the AS/400 implementation of the VWPLOADR program are AS/400 QSYS source file members or stream files in Integrated File System (IFS), the root file system.

Tip: You can improve both performance and storage use by using QSYS file members instead of stream files. CPYFRMIMPF copies the entire stream file to QRESTORE and then loads the copy into your table. See the online help for CPYFRMIMPF for more information.

Target tables can reside only in the QSYS file system.

You can make changes to the step only when the step is in development mode.

The Column Mapping page is not available for this step.

Prerequisites

To use this program, you must have the following PTFs applied to the AS/400 agent site:

Command PTF:


5769SS100 VRM420 SF46911


Code PTF:


5769SS100 VRM420 SF46976


Maintenance PTF:


5769SS100 VRM420 SF49466

These PTFs provide the AS/400 CPYFRMIMPF and CPYTOIMPF commands (LOAD and EXPORT). These are the commands that makes the VWPLOADR program work. These PTFs also install the online help for these commands.

Security

The user profile under which this program and the warehouse agent run must have at least read/write authority on the table that is to be loaded.

Requirements

The following requirements apply to the VWPLOADR program. For information about the limitations of the CPYFRMIMPF command, see the restrictions section of the online help for the CPYFRMIMPF command. To view the online help for this command, type CPYFRMIMPF on the AS/400 command prompt and press F1.

To define values for a step that uses this program:

  1. Open the step notebook.
  2. Specify general information about the program. For more information, see Providing general information about a step.
  3. Click the Parameters tab.
  4. Optional: If you want to specify the delimiters of your load program, click Specify only the delimiters of the MODSTRING. If you do not click this radio button, the default entries in the Record, Column, Character strings, and Decimal point fields are assumed to be correct.
  5. If you clicked Specify only the delimiters of the MODSTRING, specify the delimiters for your load program:
  6. Optional: To add additional parameters to the program, click Type Modstring parameters. Type your parameters in the field under the radio button.

    This field is used to modify the file characteristics that the CPYFRMIMPF command expects the input file to have. If this parameter is omitted, all of the default values that the CPYFRMIMPF command expects are assumed to be correct.

    Some of the default characteristics of the input file are:

    For more information on the default values for the CPYFRMIMPF command, see the AS/400 online help for the CPYFRMIMPF command.

    The format for the FileMod string is:

    1. The string must contain valid CPYFRMIMPF command keywords. All of the valid keywords for the CPYFRMIMPF command are described in the online help for the command.
    2. Each keyword must be followed immediately by its value. The value must be enclosed in parentheses.
    3. Each keyword must be separated from the next keyword by a space.

    Attention: Certain parameters require that you enclose values in two single quotation marks. For example, the FLDDLM command must have the values enclosed by two single quotation marks. You must do this because the Data Warehouse Center generates the AS/400 CALL statement for VWPLOADI in the form:

    CALL PGM(QIWH/VWPLOADI) PARM('fromfile' 'totable' 'filemodstring')
    

    Two single quotation marks together tells the AS/400 command prompt processor that your parameter value contains a single quotation mark character. This prevents the command line processor from confusing your single quotation mark character with the regular end-of-parameter marker.

  7. On the Processing Options page, provide information about how your step processes. For more information, see Defining processing options.
  8. Click OK to save your changes and close the step notebook.

Traces and diagnostic information

The VWPLOADR program provides two kinds of diagnostic information:

Important: Successful completion of this program does not guarantee that the data was transferred correctly. For stricter error handling, use the ERRLVL parameter.

Reading the VWPLOADR trace file

The VWPLOADR trace files are located in the Integrated File System in the /QIBM/UserData/IWH directory.

The VWPLOADR trace file has the following name format:

VWxxxxxxxx.VWPLOADR

where xxxxxxxx is the process ID of the VWPLOADR run that produced the file.

To view trace files from a workstation:

  1. Use Client Access/400 to map your AS/400 root file system to a logical drive, or use FTP to copy the file to the workstation.

    For information about using Client Access/400, see "Viewing the VWPLOADR trace via Client Access/400".

  2. Open the trace file with a text editor to view the information.

Viewing the VWPLOADR trace via Client Access/400

To use Client Access/400 to map an AS/400 system to a logical drive on an NT workstation:

  1. Set up a Client Access/400 connection to your AS/400 system over TCP/IP.
  2. Open the Windows NT File Explorer.
  3. From the Explorer menu, select Tools --> Map Network Drive.
  4. Type the pathname:
    \\hostname\.
    

    where hostname is the fully qualified TCP/IP host name of your AS/400 system.

  5. Click OK.

Requirement: If you use Client Access/400 to access the trace file, you must define the file extension .VWPLOADR to Client Access/400. Defining this extension allows Client Access/400 to translate the contents of files with this extension from EBCDIC to ASCII.

To define a file extension to Client Access/400:

  1. From Windows NT, click Start --> Programs --> IBM AS400 Client Access --> Client Access Properties.

    The Client Access notebook opens.

  2. Click the Network Drives tab.
  3. Type .VWPLOADR in the File extension: field.
  4. Click Add.
  5. Click Apply.
  6. Click OK.

You should now be able to load the file into any ASCII text editor or word processor.

AS/400 exceptions

If there was a failure of any of the system commands issued by VWPLOADR, then there will be an exception code recorded in the VWPLOADR trace file. To get an explanation for the exception:

  1. At an AS/400 command prompt, enter DSPMSGD RANGE(xxxxxxx), where xxxxxxx is the exception code. For example, you might enter DSPMSGD RANGE(CPF2817).

    The Display Formatted Message Text panel is displayed.

  2. Select option 30 to display all information. A message similar to the following message is displayed:
    Message ID . . . . . . . . . :   CPF2817
    Message file . . . . . . . . :   QCPFMSG
    Library  . . . . . . . . . :     QSYS
    Message . . . . :   Copy command ended because of error.
    Cause . . . . . :   An error occurred while the file was 
                        being copied.
    Recovery  . . . :   See the messages previously listed.  
                        Correct the errors, and then try the 
                        request again.
    

The second line in the VWPLOADR trace file contains the information that you need to issue the WRKJOB command.

To view the spool file, you can copy and paste the name of the message file to an AS/400 command prompt after the WRKJOB command, and press Enter. View the spool file for the job to get additional information about any errors that might have occurred.

Defining values for a DB2 for OS/390 Load program

Use the DB2 for OS/390 Load warehouse program to load records into one or more tables in a table space.

To define a values for a step that uses this warehouse program:

  1. Open the step notebook.
  2. Specify general information about the warehouse program. For more information, see Providing general information about a step.
  3. On the Parameters page, if you want to preformat pages in the table space and index spaces that are associated with the table, select the Entire table space and index spaces are preformatted check box to have the free pages between the high-used RBA (or page) to the high-allocated RBA be preformatted in the table space and index spaces associated with the table. The preformatting occurs after the data is loaded and the indexes are built.
  4. To specify whether records are to be loaded into an empty or non-empty table space, select the Specify RESUME option at table space level check box.

    If the table space is not empty, and you are not replacing contents (the Reset table space and indexes to empty before load check box is not checked), a DB2 UDB for OS/390 message is issued, and the utility job step terminates with a job step condition code of 8. For nonsegmented table spaces, space occupied by rows that are marked as deleted or by rows of dropped tables is not used.

  5. If you want the newly loaded rows to replace all existing rows of all tables in the table space, and not just those of the table that you are loading, select the Reset table space and indexes to empty before load check box (LOAD REPLACE) .
  6. Specify one of the Input data file type radio buttons to select the code page type for the input data. If you specify ASCII, numeric, date, time, and time stamp internal formats are not affected.
  7. Select the Format of the input records check box to identify the format of the input record. The format must be compatible with DB2 unload format (UNLOAD) or with SQL/DS(TM) unload format. This action uniquely determines the format of the input. No field specifications are allowed in an INTO TABLE option.

    The DB2 unload format specifies that the input record format is compatible with the DB2 unload format. The DB2 unload format is the result of REORG with the UNLOAD ONLY option. Input records that were unloaded by the REORG utility are loaded into the tables from which they were unloaded. Do not add or change column specifications between the REORG UNLOAD ONLY and the LOAD FORMAT UNLOAD. DB2 reloads the records into the same tables from which they were unloaded.

    The SQL/DS unload format specifies that the input record format is compatible with the SQL/DS unload format. The data type of a column in the table to be loaded must be the same as the data type of the corresponding column in the SQL/DS table. SQL/DS strings that are longer than the DB2 limit cannot be loaded.

  8. Select the Allow no substitution characters in a string check box to prohibit substitution characters from being placed in a string as a result of a conversion. When this check box is selected and DB2 UDB for OS/390 determines that a substitution character has been placed in a string as a result of a conversion, it performs one of the following actions:
  9. Select the CCSIDs for the input file check box to specify up to three coded character set identifiers (CCSIDs) for the input data. Type a valid integer value in one or all of the fields. The default CCSID is the one that you chose when you installed DB2 Universal Database for OS/390. If any of the fields relating to the CCSID are left blank, the field will use the installation default. You can specify SBCS, DBCS (double byte character set) data, or mixed DBCS (graphics and double byte character sets).
  10. Select the Enforce check and referential constraints check boxes to have DB2 notify you when there are check constraints and referential constraints. When constraints are enforced, and the loading operation detects a violation, DB2 deletes the errant row and issues a message to identify it. If you choose to enforce constraint, and referential constraints exist, sort input and sort output data sets are required.
  11. In the Maximum number of records to be written on discard data set field, specify a maximum number of source records to be written on the discard data set. The value can range from 0 to 2,147,483,647. If the maximum number is reached, LOAD processing abnormally ends, the discard data set is empty, and you cannot see which records were discarded. The default is 0, which means that there is no maximum. In this case, the entire input data set can be discarded.
  12. Type the name of the discard data set in the Discard data set field. This is a work data set that is required if you specify a number of records to be discarded. This data set holds copies of records that did not get loaded. It also holds copies of records that were loaded, then removed. It must be a sequential data set that is readable by BSAM services. SYSDISC is the data definition name that is associated with the data set name.
  13. Type the name of a work data set for error processing in the Error data set field. This data set stores information about errors that occur in the load processing. This field is required if you specified a number of records to be discarded during load processing. SYSERR is the data definition name that is associated with the data set name.
  14. Click Advanced to open the DB2 for OS/390 Load Table Space notebook. From this notebook you can specify additional options for loading data into a table. You can also select to gather statistics for a table space, index, or both. The statistics are stored in the DB2 for OS/390 catalog.
  15. On the Processing Options page, provide information about how your step processes. For more information, see Defining processing options.
  16. Click OK to save your changes and close the step notebook.

Manipulating files

The sections that follow describe how to define the different step subtypes for the warehouse file programs.

Defining values for the Copy File using FTP (VWPRCPY) program

Use the warehouse program Copy file using FTP (VWPRCPY) to copy files on the agent site to and from a remote host.

Before you copy files to OS/390, you must allocate their data sets. You cannot transfer VSAM data sets. When you define a step that uses this warehouse program, select a source file and select a target file. One file must be stored on the agent site, and the other must be stored on the OS/390 system.

This warehouse program is available for the following operating systems:

The Column Mapping page is not available for this step subtype.

To define a step that runs this warehouse program:

  1. Open the step notebook.
  2. Specify general information about the warehouse program. For more information, see Providing general information about a step.
  3. On the Parameters page, click either Copy files from the remote host to the agent site (GET method) or Copy files from the agent site to the remote host (PUT method).
  4. In the Transfer type area, click either ASCII or Binary.
  5. In the Remote system name field, type the name of the remote system from which or to which to copy files.
  6. In the Remote user ID field, type the user ID to use to connect to the remote host.
  7. In the Remote password field, type the password for the user ID.
  8. In the Verify remote password field, type the password again.
  9. On the Processing Options page, provide information about how your step processes. For more information, see Defining processing options.
  10. Click OK to save your changes and close the step notebook.

If you are having problems accessing a remote file on a secure UNIX system, verify that the home directory of the user ID contains a .netrc file. The .netrc file must contain an entry that includes the host name of the agent site and the remote user ID that you want to use.

For example, the host name of the agent site is glacier.stl.ibm.com. You want to transfer a file using FTP from the remote site kingkong.stl.ibm.com to the agent site, using the remote user ID vwinst2. The ~vwinst2/.netrc file must contain the following entry:

machine glacier.stl.ibm.com login vwinst2

Defining values for a Run FTP Command File program (VWPFTP)

Use the warehouse program Run FTP command file (VWPFTP) to transfer files from a remote host using FTP. When you define a step that uses this warehouse program, do not specify a source or target table for the step.

This warehouse program is available for the following operating systems:

The Column Mapping page is not available for this step.

To define values for a step that runs this warehouse program:

  1. Open the step notebook.
  2. Specify general information about the warehouse program. For more information, see Providing general information about a step.
  3. On the Parameters page, in the Remote System Name field, type the name of the remote system to access.
  4. In the FTP command file (full path name) filed type the path and file name of the FTP command file.

    In the FTP command file, you must supply the following information, in the order listed, and on separate lines:

    1. User ID
    2. Password
    3. One or more FTP commands, each on a separate line

    In the following example, you use FTP to log on to the remote host using the user ID and password, get the remote file, and put it in the specified local directory:
    nst1
    password
    get /etc/services d:/udprcpy.out
    quit

  5. On the Processing Options page, provide information about how your step processes. For more information, see Defining processing options.
  6. Click OK to save your changes and close the step notebook.

If you are having problems accessing a remote file on a secure UNIX system, verify that the home directory of the user ID contains a .netrc file. The .netrc file must contain an entry that includes the host name of the agent site and the remote user ID that you want to use.

For example, the host name of the agent site is glacier.stl.ibm.com. You want to transfer a file using FTP from the remote site kingkong.stl.ibm.com to the agent site, using the remote user ID vwinst2. The ~vwinst2/.netrc file must contain the following entry:

machine glacier.stl.ibm.com login vwinst2

Defining values for a Submit OS/390 JCL jobstream (VWPMVS) program

Use the Submit OS/390 JCL jobstream (VWPMVS) warehouse program to submit a JCL jobstream that resides on OS/390 to an OS/390 system for execution.

The job must have a MSGCLASS and SYSOUT routed to a held output class.

Before you use the Submit OS/390 JCL Jobstream warehouse program, test your JCL file by running it from TSO under the same user ID that you plan to use with the program.

This warehouse program runs successfully if the OS/390 host name, user ID, and password are correct. If you want to test the validity of the results generated by the JCL, you must write your own testing logic. If the FTP session times out, this program returns an FTP error, even if the JCL ultimately runs successfully on the OS/390 system.

The Submit OS/390 JCL Jobstream warehouse program also receives the JES log file on the agent site. It erases the copy of the JES log from any previous jobs on the agent site before submitting a new job for processing. It also verifies that the JES log file is downloaded to the agent site after the job completes.

The Submit OS/390 JCL Jobstream warehouse program requires TCP/IP 3.2 or later installed on OS/390. Verify that the FTP service is enabled before using the program.

When you define a step that uses this warehouse program, do not specify a source or target table for the step.

This warehouse program is available for the following operating systems:

The Column Mapping page is not available for this step.

To define values for a step that runs this warehouse program:

  1. Open the step notebook.
  2. Specify general information about the warehouse program. For more information, see Providing general information about a step.
  3. On the Parameters page, in the MVS system name field, type the name of the MVS host on which to run the job.
  4. In the MVS User ID field, type the user ID to use to connect to the MVS host.
  5. In the MVS password field, type the password for the user ID.
  6. In the Verify MVS password field, type the password again.
  7. In the MVS JCL File field, type the name of the JCL file to submit. The name must consist of the user ID plus one character.
  8. In the Local spool file full path name field, type the path and file name of the file on the agent site that is to receive the JES log file. You must define a .netrc file in the same directory as the JES file.
  9. On the Processing Options page, provide information about how your step processes. For more information, see Defining processing options.
  10. Click OK to save your changes and close the step notebook.

If you are having problems accessing a remote file on a secure UNIX system, verify that the home directory of the user ID contains a .netrc file. The .netrc file must contain an entry that includes the host name of the agent site and the remote user ID that you want to use.

For example, the host name of the agent site is glacier.stl.ibm.com. You want to transfer a file using FTP from the remote site kingkong.stl.ibm.com to the agent site, using the remote user ID vwinst2. The ~vwinst2/.netrc file must contain the following entry:

machine glacier.stl.ibm.com login vwinst2

Replicating tables

Replication is a process of maintaining a defined set of data in more than one location. It involves copying designated changes from one location (a source) to another (a target), and synchronizing the data in both locations. The source and target can be in logical servers (such as a DB2 database or a DB2 for OS/390 subsystem or data-sharing group) that are on the same machine or on different machines in a distributed network.

You can use the replication capabilities of the Data Warehouse Center when you want to keep a warehouse table synchronized with an operational table without having to fully load the table each time the operational table is updated. With replication, you can use incremental updates to keep your data current.

You can use the Data Warehouse Center to define a replication step, which will replicate changes between any DB2 relational databases. You can also use other IBM products (such as DB2 DataJoiner and DataPropagator(TM) NonRelational) or non-IBM products (such as Microsoft SQL Server and Sybase SQL Server) to replicate data between many database products -- both relational and nonrelational. The replication environment that you need depends on when you want data updated and how you want transactions handled.

To define a replication step with the Data Warehouse Center, you must belong to a warehouse group that has access to the process in which the step will be used.

The Data Warehouse Center supports five types of replication:

User copy
Produces target tables that are read-only copies of the replication source with no replication control columns added. These tables look like regular source tables and are a good starting point for replication. They are the most common type of target table.

Point-in-time
Produces target tables that are read-only copies of the replication source with a timestamp column added. The timestamp column is originally null. When changes are replicated, values are added to indicate the time when updates are made. Use this type of table if you want to keep track of the time of changes.

Base aggregate
Produces read-only tables that summarize the contents of a source table. Base aggregate replication tables are useful for tracking the state of a source table on a regular basis. Aggregate tables use SQL column functions (such as SUM and AVG) to compute summaries of the entire contents of the source tables or of the recent changes made to source table data.

Change aggregate
Produces tables that work with the change data in the control tables, not with the contents of the source table. This type of replication is useful for tracking the changes made between each Apply program cycle.

Staging table
Produces read-only tables that contain data from committed transactions. Also called consistent change-data tables (CCD tables), these tables contain different data if they are condensed, noncondensed, complete, or noncomplete.

For a replication step, promoting to test mode will create the target table and generate the subscription set. The first time a replication step is run, a full refresh copy is made. Promoting a replication step to production mode will enable the schedules that have been defined. You can make changes to a step only when it is in development mode.

Defining a replication source in the Data Warehouse Center

You define replication sources in the same way that you define other relational sources with the Data Warehouse Center. In addition to the other database objects that can be returned from a database (tables, views, and system tables), you can choose to return replication-enabled tables and views. A table or view must be defined for replication using the DB2 Control Center before it can be used as a replication source in the Data Warehouse Center. For instructions on defining a table or view as a replication source, see the DB2 Replication Guide and Reference.

When you define a replication-enabled table as a warehouse source table, the before and after image columns are identified by before or after after the column name.


Figure db2db028 not displayed.

If you choose to retrieve only replication-enabled tables from the source database, only the columns within the table that are enabled for replication will be retrieved.

For instructions on defining a replication source in the Data Warehouse Center, see Defining a DB2 warehouse source.

Defining a user copy, point-in-time, or base aggregate replication step

A source table used for a user copy or point-in-time replication step must have a primary key. Use the DB2 Control Center to define a primary key for each table that you want to include in a user copy or point-in-time replication step.

To define a user copy, point-in-time, or base aggregate replication step:

  1. Define a process object.
  2. Open the process object.
  3. Add one or more warehouse sources.
  4. Add one or more warehouse targets.
  5. Open the step notebook.
  6. Specify information for your step:
  7. On the Parameters page, select the columns that you want to replicate from the Available columns list, and click >. The columns that you select move to the Selected columns list. The Available columns list shows only the columns that have been enabled for change capture.

    To include all of the items in the Available columns list, click >>.

  8. Optional: Click Add Calculated Column to open a window in which you can create derived columns. The derived columns that you create are displayed in the Selected columns list.
  9. Optional: To select which rows are to be replicated, write a WHERE statement to subselect rows.
  10. On the Column Mapping page, map the output columns that result from the SQL statement that you defined on the Parameters page to columns in your target table. On this page, output columns from the Parameters page are referred to as source columns. Source columns are listed on the left side of the page. Target columns from the output table linked to the step are listed on the right side of the page. Use the Column Mapping page to perform the following tasks:

    If the Parameters page produces no output columns, or if this step is not linked to a target table and you did not specified automatic generation of a default table in the Parameters page, you will not be able to use this page to map your columns. Some steps will not allow you to change the column mapping.

  11. On the Processing Options page, select an agent site where you want your step to run from the Agent Site drop-down list. The selections in this list are agent sites that are common to the source tables and the target table.
  12. The Population type for replication steps can have only one value, Replicate.
  13. If you want the option to run your step at any time, select the Run on demand check box. Your step must be in test or production mode before you can run it.
  14. Optional: Select the Populate externally check box if the step is populated externally, meaning that it is started in some way other than by the Data Warehouse Center. The step does not require any other means of running in the Data Warehouse Center in order for you to change the mode to production.

    If the Populate externally check box is not selected, then the step must have a schedule, be linked to a transient table that is input to another step, or be started by another program in order for you to change the mode to production.

  15. In the Retry area, specify how many times you want the step to run again if it needs to be retried, and the amount of time that you want to pass before the next run of the step.
  16. In the Replication control database field, select the control database or subsystem that contains the replication control tables for the Apply program.
  17. In the Database type list, select the database type for the replication control database.
  18. In the User ID field, type the user ID to access the replication control database.
  19. In the Password field, type the password for the user ID that will access the database.
  20. In the Verify password field, type the password again.
  21. In the Subscription set name field, type the name of the subscription set. This name can contain a maximum of 18 characters and can be an ordinary or delimited qualifier.
  22. Optional: In the Apply qualifier field, type the name of the apply qualifier. The apply qualifier name must be unique for every replication step that is defined. If you do not specify an apply qualifier, the Data Warehouse Center generates one for you.
  23. Optional: In the Event name field, type the event name. The event name is the name of the event that is posted in the event table that the Apply program reads. The event name must be unique for every replication step that is defined. If you do not specify an event name, the Data Warehouse Center generates one for you.
  24. In the Blocking factor field, specify the number of minutes worth of change data that can be replicated during a subscription cycle.
  25. Click OK to save your changes and close the notebook.
  26. Link the step to the warehouse sources.
  27. Link the steps to the warehouse targets.
  28. Promote the step to test mode.
  29. Run the step to test it.
  30. Schedule the step.
  31. Promote the step to production mode.

Defining a change aggregate replication step

A change aggregate replication step produces tables that work with the change data in the control tables, but not with the contents of the source table.

To define a change aggregate replication step:

  1. Define a process object.
  2. Open the process object.
  3. Add one or more warehouse sources.
  4. Add one or more warehouse targets.
  5. Open the step notebook.
  6. Specify information for your step:
  7. On the Parameters page, select the columns that you want to replicate from the Available columns list, and click >. The columns that you select move to the Selected columns list. The Available columns list shows only the columns that are enabled for change capture.

    If you want to include all of the items in the Available columns list, click >>.

  8. Optional: Click Add Calculated Column to open a window in which you can create derived columns. The derived columns that you create are displayed in the Selected columns list.
  9. Optional: To select which rows are to be replicated, write a WHERE statement to subselect rows.
  10. Optional: To add calculated columns, add a GROUP BY statement. You can group rows according to the group defined in the GROUP BY statement.
  11. On the Column Mapping page, map the output columns that result from the SQL statement that you defined on the Parameters page to columns on your target table. On this page, output columns from the Parameters page are referred to as source columns. Source columns are listed on the left side of the page. Target columns from the output table that are linked to the step are listed on the right side of the page. Use the Column Mapping page to perform the following tasks:

    If the Parameters page produces no output columns, or if this step is not linked to a target table and you did not specify automatic generation of a default table in the Parameters page, you cannot use this page to map your columns. Some steps will not allow you to change the column mapping.

  12. On the Processing Options page, select an agent site where you want your step to run from the Agent Site drop-down list. The selections in this list are agent sites that are common to the source tables and the target table.
  13. The Population type for replication steps can have only one value, Replicate.
  14. If you want the option to run your step at any time, select the Run on demand check box. Your step must be in test or production mode before you can run it.
  15. Optional: Select the Populate externally check box if the step is populated externally, meaning that it is started in some way other than by the Data Warehouse Center. The step does not require any other means of running in the Data Warehouse Center in order for you to change the mode to production.

    If the Populate externally check box is not selected, then the step must have a schedule, be linked to a transient table that is input to another step, or be started by another program in order for you to change the mode to production.

  16. In the Retry area, specify how many times you want the step to run again if it needs to be retried, and the amount of time that you want to pass before the next run of the step.
  17. In the Replication control database field, select the control database or subsystem that contains the replication control tables for the Apply program.
  18. In the Database type list, select the database type for the replication control database.
  19. In the User ID field, type the user ID to access the replication control database.
  20. In the Password field, type the password for the user ID that will access the database.
  21. In the Verify password field, type the password again.
  22. In the Subscription set name field, type the name of the subscription set. This name can contain a maximum of 18 characters and can be an ordinary or delimited qualifier.
  23. Optional: In the Apply qualifier field, type the name of the apply qualifier. The apply qualifier name must be unique for every replication step that is defined. If you do not specify an apply qualifier, the Data Warehouse Center generates one for you.
  24. Optional: In the Event name field, type the event name. The event name is the name of the event that is posted in the event table that the Apply program reads. The event name must be unique for every replication step that is defined. If you do not specify an event name, the Data Warehouse Center generates one for you.
  25. In the Blocking factor field, specify the number of minutes worth of change data that can be replicated during a subscription cycle.
  26. Click OK to save your changes and close the notebook.
  27. Link the step to the warehouse sources.
  28. Link the steps to the warehouse targets.
  29. Promote the step to test mode.
  30. Run the step to test it.
  31. Schedule the step.
  32. Promote the step to production mode.

Defining a staging table replication step

A staging table replication step produces read-only tables that contain data from committed transactions. A source table that is used for a staging table replication step must have a primary key. Use the DB2 Control Center to define a primary key for each table that you want to include in a staging table replication step.

To define a staging table replication step:

  1. Define a process object.
  2. Open the process object.
  3. Add one or more warehouse sources.
  4. Add one or more warehouse targets.
  5. Open the step notebook.
  6. Specify information for your step:
  7. On the Parameters page, select the columns that you want to replicate from the Available columns list, and click >. The columns that you select move to the Selected columns list. The Available columns list shows only the columns that are enabled for change capture.

    To include all of the items in the Available columns list, click >>.

  8. Optional: Click Add Calculated Column to open a window in which you can create derived columns. The derived columns that you create are displayed in the Selected columns list.
  9. Optional: To select which rows are to be replicated, write a WHERE statement to subselect rows.
  10. Optional: Click Staging table options to change any properties for your table. This option is available if you choose a replication type of Staging Table.
  11. On the Column Mapping page, map the output columns that result from the SQL statement that you defined on the Parameters page to columns on your target table. On this page, output columns from the Parameters page are referred to as source columns. Source columns are listed on the left side of the page. Target columns from the output table linked to the step are listed on the right side of the page. Use the Column Mapping page to perform the following tasks:

    If the Parameters page produces no output columns, or if this step is not linked to a target table and you did not specify automatic generation of a default table in the Parameters page, you cannot use this page to map your columns. Some steps will not allow you to change the column mapping.

  12. On the Processing Options page, select an agent site where you want your step to run from the Agent Site drop-down list. The selections in this list are agent sites that are common to the source tables and the target table.
  13. The Population type for replication steps can have only one value, Replicate.
  14. If you want the option to run your step at any time, select the Run on demand check box. Your step must be in test or production mode before you can run it.
  15. Optional: Select the Populate externally check box if the step is populated externally, meaning that it is started in some way other than by the Data Warehouse Center. The step does not require any other means of running in the Data Warehouse Center in order for you to change the mode to production.

    If the Populate externally check box is not selected, then the step must have a schedule, be linked to a transient table that is input to another step, or be started by another program in order for you to change the mode to production.

  16. In the Retry area, specify how many times you want the step to run again if it needs to be retried, and the amount of time that you want to pass before the next run of the step.
  17. In the Replication control database field, select the control database or subsystem that contains the replication control tables for the Apply program.
  18. In the Database type list, select the database type for the replication control database.
  19. In the User ID field, type the user ID to access the replication control database.
  20. In the Password field, type the password for the user ID that will access the database.
  21. In the Verify password field, type the password again.
  22. In the Subscription set name field, type the name of the subscription set. This name can contain a maximum of 18 characters and can be an ordinary or delimited qualifier.
  23. Optional: In the Apply qualifier field, type the name of the apply qualifier. The apply qualifier name must be unique for every replication step that is defined. If you do not specify an apply qualifier, the Data Warehouse Center generates one for you.
  24. Optional: In the Event name field, type the event name. The event name is the name of the event that is posted in the event table that the Apply program reads. The event name must be unique for every replication step that is defined. If you do not specify an event name, the Data Warehouse Center generates one for you.
  25. In the Blocking factor field, specify the number of minutes worth of change data that can be replicated during a subscription cycle.
  26. Click OK to save your changes and close the notebook.
  27. Link the step to the warehouse sources.
  28. Link the steps to the warehouse targets.
  29. Promote the step to test mode.
  30. Run the step to test it.
  31. Schedule the step.
  32. Promote the step to production mode.


[ Top of Page | Previous Page | Next Page ]