Data Warehouse Center Administration Guide


Chapter 11. Extending the Data Warehouse Center with user-defined programs

You can use user-defined programs to use the best data warehouse software for your needs, while providing a single point-of-control to administer the warehouse. The Data Warehouse Center will start an application that you define as a user-defined program at a scheduled time.

For example, if you have a data cleansing program that you want to use on your warehouse tables, you can define the data cleansing program as a user-defined program, and run a step for that program that starts after a step that populates the warehouse tables.

This chapter explains the following tasks:


Defining a user-defined program

A user-defined program is a program that you create or a program that is created by a vendor. You define a user-defined program to the Data Warehouse Center so that one or more steps can use the program for processing. If you are defining a stored procedure, you must ensure that the stored procedure exists in the target database. If you are defining any other type of program, you must install the program on one or more Data Warehouse Center agent sites before you define the program to the Data Warehouse Center.

After you define a user-defined program to the Data Warehouse Center, the program definition is available for use as a step in the Process Model window.

Defining a program group

A user program group is a logical group that contains related user-defined programs. You must create a user program group before you can define a user-defined program to the Data Warehouse Center.

To define a user program group:

  1. From the Data Warehouse Center, expand the Administration folder.
  2. Expand the Programs and Transformers folder.
  3. Right-click the User-Defined Program and Transformers folder and click Define Group.
  4. Specify information for your program group. You can enter information into any of these fields.
  5. Click OK to save your changes and close the notebook.

Specifying the program to use

To define a user-defined program to the Data Warehouse Center:

  1. From the Data Warehouse Center, expand the Administration folder.
  2. Expand the Programs and Transformers folder.
  3. Expand the User-Defined Program and Transformers folder.
  4. If you are defining a new program, right-click a program group, and click Define Program.
  5. If you are editing an existing program definition, expand the program group folder. Then, right-click the program, and click Properties.
  6. Specify information for your program:

Specifying the agent site to use

On the Agent Sites page of the Program notebook, you must select the agent site on which the program is installed.

If you specified a user ID and password when you defined the agent site, the program will run as a user process. If you did not specify a user ID and password, the program will run however the warehouse agent was defined.

You can run some programs as user processes and other programs as system processes on the same workstation. To do this, define two agent sites on the workstation: one that has a user ID and password, and one that does not.

If you are defining a transformer, you must ensure that the warehouse agent that you associate with this program is associated with the database where your transformer exists. If you assign a warehouse agent to this program definition that is not associated with the target database specified on this page, you will receive a warning when you create a step that uses this program.

For example, you have the following warehouse agent, source, and targets:
Agent Source Target
Agent A Source 1 Target 1


Target 2

Suppose Agent A is associated with Source 1 and Target 1. Now, suppose that your transformer exists on Target 2. If you select Agent A in your transformer program definition, you will receive a warning when you try to create a step. This warning is issued because Agent A is not associated with the target that contains the transformer.

To use Agent A, you must associate it with Target 2, or you must store the transformer on Target 1 and add Target 1 to the target resources definition:

  1. Depending on the type of program that you are defining, click the Agent sites tab or the Targets tab.
  2. If you are defining a program, from the Available Agent Sites list, select the agent site on which the program is installed, and click Add. The name of the agent site moves to the Selected Agent Sites list. Repeat this step for each agent site on which the program is installed.

    If you are defining a transformer, from the Available Targets list, select the target on which the stored procedure exists and click Add. The name of the target moves to the Selected Targets list.

  3. Click OK to save your changes and close the notebook.

For more information about defining agent sites, see Defining agent sites and the online help.

Specifying parameters

You can use predefined tokens for some parameters. The Data Warehouse Center substitutes values for the tokens at run time. For example, there is a token for the database name of the target resource for a step, &TDB. If you include this token in your parameter list, the Data Warehouse Center provides the name of the database defined in the notebook of the warehouse target that contains the target table that is linked to the step. Tokens allow you to change the values that are passed depending on which step uses the program.

If you want your user-defined program to extract data from the sources that are linked to the step that uses the program, you can include the &STMT token in your parameter list. This token directs the Data Warehouse Center to pass the SELECT statement that is defined for the step to the program. The SELECT statement is passed to the program as a one command-line parameter, enclosed in double quotation marks. Any embedded quotation marks are enclosed in escape characters. If the SELECT statement is exceptionally long, it might conflict with operating system guidelines for the maximum number of parameters.

For a complete list of parameters with predefined tokens, see "Select Parameters fields and controls" in the online help.

To add a system parameter or system token to a program definition:

  1. Open the Define User Program notebook.
  2. Click the Parameters tab.
  3. Click Add.
  4. In the Type area, click System parameters or System tokens.
  5. In the Available parameters field, select a parameter or token. You can select multiple parameters or tokens by pressing the CTRL key while you make your selections.
  6. Click OK.

    The parameters or tokens that you select are listed on the Parameters page of the Define User Program notebook.

Example: You are defining a user-defined program that checks for a file at regular intervals on a Windows NT workstation. You intend to use this program to find a file that another step will load into a database.

The program uses the following parameters:

You use the Warehouse target file name system parameter (&TTBN) to represent the file name. You define your own parameters for the polling interval and timeout interval.


Using a user-defined program step

After you define your user-defined program, you define steps that use the program for processing.

When you define a step that runs a user-defined program, you can change the parameter values that are defined for the program. If you change the parameter values for the program, the changes affect only the instance of the program that is being used in the step. The changes do not affect the original program definition.

To permanently change the parameter values of a user-defined program, you must edit the program definition.

If your user-defined program uses tokens for a source or target, you must link this step to the source or target.

To define a step that runs a user-defined program:

  1. Define a process.
  2. Right-click the process. Click Open.. The Process Model window opens.
  3. Click the user-defined program step icon in the left pane of the Process Model window. To view descriptions of the step icons, click View --> Legend.
  4. Select a step subtype (if applicable). Then, move your mouse into the Process Model window and click.
  5. Link the step to any needed sources or targets.
  6. Right-click the step, and click Properties.

    To edit an existing step within the Process Model window, click a process, and click Open. The Process Model window opens. Right-click the step, and click Properties.

    To edit an existing step outside of the Process Model window, click a process in the Warehouse tree view. A list of objects that are contained in the process is displayed in the right frame. Right-click a step, and click Properties.

  7. Specify information for your step:
  8. On the Parameters page, view or edit the parameters and the parameter values that the program uses. To edit a parameter value, double-click the Parameter value field for a parameter, and type the new parameter value. If you change a parameter value, the change affects only the instance of the program that is run by this step. The original program parameter values are not changed.
  9. On the Processing Options page, in the Agent Sites list, select an agent site where you want your step to run. The selections in this list are agent sites that are associated with the source tables, the target table, and the transformer or program that you are defining.
  10. If you want the option to run your step at any time, select the Run on demand check box. Your step must be in test or production mode before you can run it.
  11. Optional: Select the Populate externally check box if the step is populated externally, meaning that it is started in some way other than by the Data Warehouse Center. The step does require any other means of running in the Data Warehouse Center in order for you to change the mode to production.

    If the Populate externally check box is not selected, then the step must have a schedule, be linked to a transient table that is input to another step, or be started by another program in order for you to change the mode to production.

  12. In the Retry area, specify how many times you want the step to run again if it needs to be retried, and the amount of time that you want to pass before the next run of the step.
  13. Click OK to save your changes and close the step notebook.

Example: You define a step that uses the user-defined program that you defined in the previous section. The step has no source. Because you are using the file to be found as a source for the next step in the sequence, you define the file as a target for this step. You then define a load step that uses the file as a source. The load step loads the file into a database.


Figure db2db053 not displayed.

Testing steps that use user-defined programs

After you define a step that uses a user-defined program, promote the step to test mode and run it.

To promote a step to test mode:

  1. Open the Process Model window.
  2. Right-click the step to promote.
  3. Click Mode --> Test.

    After the step is promoted, Test is selected when you right-click Mode.

To test a step:

  1. Open the Process Model window.
  2. Right-click the step.
  3. Click Test.
  4. Click OK.

    The step is displayed on the Work in Progress window with a status of Populating. The processing should complete successfully. However, if the processing fails, see Testing a step for information about debugging what went wrong.

After you test the step, you can schedule it in the same way as any other step. For more information about scheduling steps, see Scheduling warehouse processes


Writing your own program for use with the Data Warehouse Center

You can write programs in any language that supports one of the following program types: executable, batch program, dynamic link library, or stored procedure.

If the user-defined program has a program type of executable, batch program, batch command file, or dynamic link library, it must reside on the agent site. The warehouse agent starts the program at the scheduled time. On Windows NT, the agent runs as a system process by default, which means that the program cannot access resources or programs that require a user ID. Also, any environment variables that the program needs to access must be system variables.

If you write user-defined programs using Object REXX for Windows, complete the following procedure to enable these programs to run under Windows NT:

  1. Define the warehouse agent or server service as a system process that can interact with the Windows NT desktop:
    1. Select the agent or server service from the Service list.
    2. Click Startup.
    3. Click System Account.
    4. Select the Allow Service to Interact with Desktop check box.
  2. Initialize the Object REXX environment before the warehouse agent or server starts the program. You can initialize the environment by running any Object REXX program from the command line.
  3. If your Object REXX program issues a DB2 CONNECT statement, verify that the statement includes the user ID and password, as in the following example:
    DB2 CONNECT TO testdb USER db2admin USING db2pass
    

If the user-defined program has a program type of stored procedure, the program must reside in the warehouse database that is associated with the step that runs the program.

Passing parameters

At run time, the Data Warehouse Center generates a command-line parameter list that it passes as input to your user-defined program. Whenever possible, test your program from the command line before using it in a step.

Example: You write a user-defined program that checks for a file at regular intervals on a Windows NT workstation. It uses the following parameters:

The program gets the parameters as shown in Figure 15:

Figure 15. Reading parameters from the command line

char *remoteFile    = argv[1];
int  interval = 5;	// check every 5 minutes
int  timeout = 60; // time-out period in minutes
 
if( (argc==3) || (argc==4) )  
{
        if( NumStrValidation( argv[2] ) ) // get polling interval
        {
                interval = atoi( argv[2] );
                if( interval == 0 )
                        interval = 5;  // used the original one
        }
        else
        {
                trace << "Invalid value for polling interval\n";
                flush(trace);
                return PARAMETER_ERROR;
        }
 
        if( argc==4 )		// get timeout period
        {
                if( NumStrValidation( argv[3] ) )
                {
                        timeout= atoi( argv[3] );
                        if( timeout == 0 )
                                timeout = 480;  // used the original one
                }
                else
                {
                        trace << "Invalid value for timeout period\n";
                        flush(trace);
                        return PARAMETER_ERROR;
                }
        }
}

Returning status information

After your program runs, it must return a return code to the step that uses the program. The return code must be a positive integer. If your program does not return a return code, the step using the program might fail. The Data Warehouse Center displays the return code in the Error RC2 field of the Log Details window when the value of Error RC1 is 8410. If the value of Error RC2 is 0, then the program ran successfully without errors.

Your program can return additional status information to the Data Warehouse Center:

The warehouse agent transfers the additional status information to the warehouse server.

Transferring the information to the Data Warehouse Center

To transfer the additional status information to the warehouse agent, your program must create a file, called a feedback file, that contains the additional status information. The path and file name for the feedback file is specified in the VWP_LOG environment variable, which the warehouse agent sets before it calls the warehouse program. Your program must read the value of this variable, using the appropriate system call, to create the feedback file. After the user-defined program finishes running, the warehouse agent checks whether the feedback file exists. If it exists, the agent processes the file. Otherwise, the warehouse agent will do nothing. If the user-defined program cannot create the file, it should continue to run.

Format of the feedback file

Your user-defined program can write the additional status information to the feedback file in any order, but must use the following format to identify information. Enclose each item returned within the begin tag <tag> and end tag </tag> in the following list. Each begin tag must be followed by its end tag; you cannot include two begin tags in a row. For example, the following tag format is valid:
<RC>...</RC>...<MSG>...</MSG>
The following embedded tag format is not valid:
<RC>...<MSG>...</RC>...</MSG>

You can specify the following information in the feedback file:

Return code
<RC>return code</RC>, where return code is a positive integer.

Return code warning flag
<WARNING>1</WARNING> sets the return code warning flag to on. If this flag is set on, the step's processing status in the Work in Progress window will be Warning.

Data Warehouse Center system message
<MSG>message text\n</MSG>

message text
The text of one or more messages

\n
The new line character. Include this character at the end of each message if there are multiple messages.

Comment
<COMMENT>comment text</COMMENT>, where comment text is the text of the comment.

Number of rows of data processed
<ROWS>number of rows</ROWS>, where number of rows is any positive integer.

Number of bytes processed
<BYTES>number of bytes</BYTES>, where number of bytes is any positive integer.

SQLSTATE
<SQLSTATE>sqlstate string</SQLSTATE>, where sqlstate string is any string whose length is greater than 0 and less than or equal to 5 digits.

Figure 16 shows an example of the feedback file.

Figure 16. Example of the feedback file

<RC> 20</RC>
<ROWS>2345</ROWS>
<MSG>The parameter type is not correct</MSG>
<COMMENT> Please supply the correct parameter type (PASSWORD
     NOTREQUIRED, GETPASSWORD, ENTERPASSWORD)</COMMENT> 
<BYTES> 123456</BYTES>
<WARNING> 1</WARNING>
<SQLSTATE>12345</SQLSTATE>

How the feedback determines the step status

The return codes and step status for the user-defined program that are displayed in the log viewer vary, depending on the following values set by the program:

Table 17 lists the possible combinations of these values and the results that they produce.

Table 17. Feedback file conditions and results
Conditions Results
Step status1 Values of Error RC1 and RC2
The user-defined program return code is 0 No feedback file exists2 Successful

RC1 = 0;

RC2 = 0

A feedback file exists2 The value of <RC> in the feedback file is 03 <WARNING> is not set in the feedback file Successful

RC1 = 0;

RC2 = 0

The value of <WARNING> in the feedback file is 1 Warning

RC1 = 0;

RC2 = 0

The value of <RC> in the feedback file is non-03 <WARNING> is not set in the feedback file Failed

RC1 = 8410

(the user-defined program failed);

RC2 = the value of <RC> in the feedback file

The value of <WARNING> in the feedback file is 1 Warning

RC1 = 0;

RC2 = the value of <RC> in the feedback file

The user-defined program return code is nonzero No feedback file exists2 Failed

RC1 = 8410 (the user-defined program failed);

RC2 = the code returned by the user-defined program

A feedback file exists2 The value of <RC> in the feedback file is 03 <WARNING> is not set in the feedback file Successful

RC1 = 0;

RC2 = 0

The value of <WARNING> in the feedback file is 1 Warning

RC1 = 0;

RC2 = 0

The value of <RC> in the feedback file is non-0 <WARNING> is not set in the feedback file Failed

RC1 = 8410 (the user-defined program failed);

RC2 = the code returned by the user-defined program

The value of <WARNING> in the feedback file is 1 Warning

RC1 = 0;

RC2 = the value of <RC> in the feedback file

Notes:

  1. The step processing status, as displayed in the Work in Progress window.

  2. The Data Warehouse Center checks for the existence of the feedback file, regardless of whether the return code for the user-defined program is 0 or nonzero.

  3. The value of <RC> in the feedback file is always displayed as the value of the RC2 field of the Log Details window.


[ Top of Page | Previous Page | Next Page ]