Business Intelligence Tutorial

Lesson 8. Defining data transformation and movement

In this lesson, you will define how the Data Warehouse Center is to move and transform data into its format for the data warehouse. First, you will define a process, which contains a series of steps in the transformation and movement process. Next, you will specify the source tables that are to be transformed for the warehouse. Finally, you will define data transformation steps that use two different methods of transformation:

Specifically, you will define the Tutorial Market process, which performs the following processing:

  1. Loading the Demographics file into the warehouse database.
  2. Selecting data from the GEOGRAPHIES table and creating a target table.
  3. Joining the data in the Demographics table and the GEOGRAPHIES target table.

The end result is the LOOKUP_MARKET target table.


Defining a process

In this exercise, you will define the process object for the Tutorial Market process.

To define the process object:

  1. From the Data Warehouse Center window, expand the Subject Areas tree.
  2. Expand the TBC Tutorial subject area, which you defined in Lesson 5, Defining a subject area.
  3. Right-click the Processes folder, and click Define.

    The Define Process notebook opens.

  4. In the Name field, type the name of the process:
    Tutorial Market
    

    The name can be up to 80 characters long and is case-sensitive. The first character of the name must be alphanumeric. You cannot use & as the first character in English. This field is required.

  5. In the Administrator field, type your name as the contact for the process definition.
  6. In the Description field, type the description of the process:
    Process to create the LOOKUP_MARKET table
    
  7. Click the Security tab.
  8. In the Available Security Groups list, click the Tutorial Warehouse Group, which you defined in Defining the warehouse group.

    Adding the process to the security group authorizes the users in the group (in this case, you) to open and add objects to the process.

  9. Click >.

    The Tutorial Warehouse Group moves to the Selected Security Groups list.

  10. Click OK.

    The Define Process notebook closes.


Opening the process

In this exercise, you will open the process so that you can graphically define the flow of data in the process in the next exercise.

To open the Tutorial Market process:

  1. Right-click the Tutorial Market process.
  2. Click Open.

Adding tables to a process

To define the flow of data, you need to add each source that the steps transform and the target tables that result from the transformation.

In the Tutorial Market process, you are to load the Demographics file into the target database, so you need to add the source file and the DEMOGRAPHICS_TARGET target table for the step to the process. The Demographics source file is part of the Tutorial File Source warehouse source, which you defined in Lesson 6, Defining warehouse sources. The DEMOGRAPHICS_TARGET target table is part of the Tutorial Targets warehouse target, which you defined in Lesson 7, Defining warehouse targets.

To add the Demographics file:

  1. Click the Add Data icon:
    Figure alldata not displayed.

    Click the canvas at the spot that you want to place the table. The Add Data window opens.

  2. In the Available source and target tables list, expand the Warehouse Sources tree.

    A list of the warehouse sources defined in the warehouse is displayed.

  3. Expand the tree for the Tutorial File Source warehouse source.
  4. Expand the Files tree.

    In the tree, you should see X:\sqllib\samples\db2sampl\dwc\demographics.txt, where X is the drive on which you installed the sample.

  5. Select the Demographics file.
  6. Click > to add the Demographics file to the Selected source and target tables list.

To add the DEMOGRAPHICS_TARGET target table:

  1. In the Available source and target tables list, expand the Warehouse Targets tree.

    A list of the warehouse targets defined in the warehouse is displayed.

  2. Expand the Tutorial Targets warehouse target tree.
  3. Expand the Tables tree.

    You should see the DEMOGRAPHICS_TARGET target table in the list.

  4. Select the DEMOGRAPHICS_TARGET target table.
  5. Click > to add the DEMOGRAPHICS_TARGET target table to the Selected source and target tables list.

In the next part of this exercise, you need to add the GEOGRAPHIES source table. When you define a step that selects data from the GEOGRAPHIES table, you can specify that the Data Warehouse Center automatically generate a target table, so you do not need to add a target table.

To add the GEOGRAPHIES source table:

  1. From the Add Data window, select the GEOGRAPHIES table.
  2. Click > to add the GEOGRAPHIES table to the Selected source and target tables list.
  3. Click OK to add the file and tables that you selected to the process.

    The file and tables that you selected are displayed on the Process Model window.

The last step will use the Demographics table and the Geographies table as sources, so you do not need to specify sources for the step. You can specify that the Data Warehouse Center automatically generate the resulting target table, LOOKUP_MARKET, so you do not need to specify a target table for the step.


Adding steps to the process

Now, you need to add the steps that define how the source data is to be transformed into the target data. You must define three steps:

Load Demographics Data
A DB2 program step that loads data from the Demographics file into a table in the TBC Warehouse database, which you created in Lesson 2, Creating a warehouse database.

Select Geographies Data
An SQL step that selects columns from the Geographies source table.

Join Market Data
An SQL step that joins the Geographies and Demographics tables and writes the joined table into the Tutorial Warehouse database.

Defining the Load Demographics step

In this exercise, you will define the Load Demographics step:

  1. From the palette (the toolbar on the left side of the window), click the DB2 Programs icon, which is the second icon from the bottom:
    Figure db2prog not displayed.

    Each program icon represents a program group, which is a grouping of similar programs.

  2. Click DB2 UDB --> Load.

    The programs in a program group cascade from the program icon.

  3. Click the spot on the canvas (the empty area on the right side of the window) where you want to place the step.

    An icon for the step is added to the window.

  4. Right-click the step.
  5. Click Properties.

    The Step notebook opens.

  6. In the Name field, type the name of the step:
    Load Demographics Data
    
  7. In the Administrator field, type your name as the contact for this step.
  8. In the Description field, type the description of the step:
    Loads demographics data into the warehouse.
    
  9. Click OK.

    The Step Properties notebook closes.

  10. Click the Task Flow icon:
    Figure arrowcol not displayed.
  11. Click the Data Link icon:
    Figure arrowgry not displayed.

    You will use the Data Link icon to define the flow of data from the source file, through transformation by a step, to the target table.

  12. Click the middle of the Demographics source file and drag the mouse to the Load Demographics Data step.

    The Data Warehouse Center draws a line between the file and the step.

    This indicates that the Demographics source file contains the source data for the step.

  13. Click the middle of the Load Demographics Data step and drag the mouse to the Demographics Table target table.


    A link between the Demographics source file and the Load Demographics Step

    This indicates that the Demographics Table target table contains the target data for the step.

  14. Right-click the step.
  15. Click Properties.

    The Step notebook opens.

  16. Click the Parameters tab.
    The Parameters page of the Step Properties notebook
  17. From the Load mode list, click REPLACE.

    Accept the rest of the default values on the page.

    You can skip the Column Mapping page because the Load program does not use the column mapping to generate the target table. It generates the target table based on the source file.

  18. Accept the default values on the Processing Options page.
    Figure db2tu021 not displayed.

    For more information about the values on this page, see "Loading data into a table" the online help.

  19. Click OK.

    The Step notebook closes.

Defining the Select Geographies step

In this exercise, you will define the Select Geographies step:

  1. From the palette, click the SQL icon:
    Figure sql not displayed.
  2. Click the spot on the canvas where you want to place the step.

    An icon for the step is added to the window.

  3. Right-click the step.
  4. Click Properties.

    The Step notebook opens.

  5. In the Name field, type the name of the step:
    Select Geographies Data
    
  6. In the Administrator field, type your name as the name of the contact for the step.
  7. In the Description field, type the description of the step:
    Selects Geographies data from the warehouse source
    
  8. Click OK.

    The Step notebook closes.

  9. Click the Task Flow icon:
    Figure arrowcol not displayed.
  10. Click the Data Link icon:
    Figure arrowgry not displayed.
  11. Click the middle of the Geographies source table and drag the mouse to the middle of the Select Geographies Data step.

    The Data Warehouse Center draws a line that indicates that the Geographies source table contains the source data for the step.

    Because you will specify that the Data Warehouse Center is to create the target table, you do not need to link a target table to the step.

  12. Right-click the Select Geographies Data step.
  13. Click Properties.

    The Step notebook opens.

  14. Click the SQL Statement tab.
  15. Click Build SQL to build SQL using the SQL Assist wizard, which generates SQL based on the selections that you make in the wizard.

    SQL Assist opens.

  16. Click the Tables tab.
  17. Verify that GEOGRAPHIES is listed.
    Figure db2tu016 not displayed.
  18. Click the Columns tab.
  19. Click >> to add all the columns from the Geographies table to the Output list.
    The Columns page of the Build SQL wizard
  20. Click the Review tab to view the SQL statement that you just built.
  21. Click OK.

    SQL Assist closes. The SQL statement that you just built is displayed on the SQL Statement page.
    The GEOGRAPHIES SELECT statement on the SQL Statement page.

  22. Click Test to test the SQL that you just generated.

    The Data Warehouse Center returns sample results of your SELECT statement. These results should be the same as the results that you got in Lesson 3, Browsing the source data when you browsed the sample data for the Geographies source table.

  23. Click Close to close the window.
  24. Select the Create Warehouse Target Table based on Parameters check box.

    Selecting this check box specifies that the Data Warehouse Center is to create the target table, based on the values that are specified on the Column Mapping page.

  25. From the Warehouse Target list, click Tutorial Targets.

    The warehouse target is the database or file system in which the target table is to be created.

  26. Click the Column Mapping tab.
  27. Verify that the source columns map to the correct target columns.
    Figure db2tu019 not displayed.
  28. Click the Processing Options tab.
  29. In the Population Type list, click Replace.
  30. Verify that the Run on demand check box is selected.

    Accept the rest of the default values on this page. For more information about the values on this page, see the online help.

  31. Click OK.

    The Step notebook closes. The Data Warehouse Center generates a target table called "Select_Geog_T". The target table name is in quotation marks because it is in mixed case.

  32. Change the name of the target table:

    1. Right-click the "Select_Geog_T" target table.
    2. Click Properties.

      The Table notebook opens.

    3. In the Table schema field, type IWH.
    4. In the Table name field, type the new name for the table:
      GEOGRAPHIES_TARGET
      
    5. In the Description field, type a description of the table:
      Autogenerated target table for the Select Geographies Data step.
      
    6. In the Business Name field, type a descriptive name for the table:
      Geographies Target
      
    7. Clear the Part of an OLAP schema check box.
    8. Click the Columns tab to view the column definitions.
    9. Click OK. The Table notebook closes.

Defining the Join Market Data step

To define the Join Market Data step:

  1. From the palette, click the SQL icon.
  2. Click the spot on the canvas where you want to place the step.

    An icon for the step is added to the window.

  3. Right-click the step.
  4. Click Properties.

    The Step notebook opens.

  5. In the Name field, type the name of the step:
    Join Market Data
    
  6. In the Administrator field, type your name as the contact for the step.
  7. In the Description field, type the description of the step:
    Joins the Geographies table with the Demographics table
    
  8. Click OK.

    The Step notebook closes.

  9. Click the Task Flow icon:
    Figure arrowcol not displayed.
  10. Click the Data Link icon:
    Figure arrowgry not displayed.
  11. Click the middle of the Geographies Target table and drag the mouse to the Join Market Data step. Repeat this step with the Demographics Target table and the Join Market Data Step.

    The Data Warehouse Center draws lines that indicate that the Geographies Target table and Demographics Target tables contain the source data for the step.

    Because you will specify that the Data Warehouse Center is to create the target table, you do not need to link a target table to the step.

  12. Right-click the Join Market Data step.
  13. Click Properties.

    The Step notebook opens.

  14. Click the SQL Statement tab.
  15. Click Build SQL to have the Data Warehouse Center generate SQL. (Otherwise, you can type in your own SQL.)

    SQL Assist opens.

  16. Click the Tables tab.
  17. Verify that the DEMOGRAPHICS_TARGET and GEOGRAPHIES_TARGET tables are selected.
  18. Click the Columns tab.
    The GEOGRAPHIES SELECT statement on the SQL Statement page.
  19. Click >> to add all the columns from the Geographies table and the Demographics table to the Output columns list.
  20. From the Output columns list, click DEMOGRAPHICS_TARGET.STATE.
  21. Click <.

    The DEMOGRAPHICS_TARGET.STATE column moves to the Available columns list.

  22. Click DEMOGRAPHICS_TARGET.CITY.
  23. Click <.

    The DEMOGRAPHICS_TARGET.CITY column moves to the Available columns list.

  24. You do not need the DEMOGRAPHICS_TARGET.STATE and DEMOGRAPHICS_TARGET.CITY columns because it is already defined in the GEOGRAPHIES_TARGET table. You will join the tables using the CITY column in both tables, which is unique.

  25. Click Add.

    The Expression Builder window opens.

  26. In the Expression field, type the following CASE statement:
    case 
    when POPULATION < 9000000   then 1
    when POPULATION < 15000000  then 2
    Else				 3
    end
    

    Figure db2tu062 not displayed.
  27. Click OK.

    A new column is added to the Selected Columns list.

  28. Click the Name field of the new column and type the name of the column:
    SIZE_ID
    
  29. Press Enter.
  30. Click Move Up to move the SIZE_ID column up above the POPULATION column.
  31. Click the Joins tab.
  32. In the GEOGRAPHIES table, select the CITY column.
  33. In the DEMOGRAPHICS table, select the CITY column.
  34. Click Join.

    SQL Assist draws a line between the CITY columns, which indicates that the tables are joined on that column.


    The GEOGRAPHIES SELECT statement on the SQL Statement page.

  35. Click the Review tab to view the SQL statement that you just built.
  36. Click OK.

    SQL Assist closes.

  37. Select the Create Warehouse Target Table based on Parameters check box.

    Selecting this check box specifies that the Data Warehouse Center is to create the target table, based on the values that are specified on the SQL Statement and Column Mapping pages.

  38. From the Warehouse Target list, click Tutorial Targets.
  39. Click the Column Mapping tab.
  40. Verify that the source columns map to the correct target columns.
  41. Click the Processing Options tab.
  42. In the Population Type list, click Replace.
  43. Select the Run on demand check box to run the step on demand.

    Accept the rest of the default values on this page. For more information about the values on this page, see the online help.

  44. Click OK.

    The Step notebook closes.

  45. Change the name of the target table:

    1. Right-click the "Join_Market_T" target table.
    2. Click Properties.

      The Table notebook opens.

    3. In the Table schema field, type IWH.
    4. In the Table name field, type the new name for the table:
      LOOKUP_MARKET
      
    5. In the Description field, type a description of the table:
      Market dimension data
      
    6. Verify that the Part of an OLAP schema check box and Dimension table radio button are selected.

      This table is one of the dimension tables that you will include in a star schema in Lesson 16, Creating a star schema from within the Data Warehouse Center.

    7. Click the Columns tab to view the column definitions.
    8. Clear the Nullable check box for the CITY ID column because you will define this column as the primary key for the table in Lesson 11, Defining keys on target tables
    9. Click OK. The Table notebook closes.

    Figure db2tu042 not displayed.

What you just did

You defined the LOOKUP_MARKET dimension table in the Tutorial Market process, which contains three steps:

The sources and targets for each of the steps are:
Step Sources Target
Load Demographics Data Demographics file Demographics Target table
Select Geographies Data Geographies table Geographies Target table
Join Market Data Demographics Target table and Geographies Target table Lookup Market table

For this tutorial, you added the data links for each step while you were defining the properties of each step. Another way you can accomplish this task is to add all the steps in the process at one time, link the steps to their sources and targets, and then define the properties of each step. The Data Warehouse Center assigns default names to the steps that you can change in the Step notebook.


Defining the rest of the star schema (optional)

The table that you created in the previous section, LOOKUP_MARKET, is one of the dimension tables in the example in Tutorial business problem. This section covers the steps for building the rest of the dimension tables and the fact table for the star schema.

This section is optional, but if you do not complete the steps in this section, you will not be able to do the following lessons:

If you want to skip this section, go to Lesson 9, Testing warehouse steps.

When you define each table, you must define a new process for the table. Instead of defining your own step for the process, however, you will copy the step that is defined in the sample. The definition of the step is in the Data Warehouse Center that you are using. When you copy the step, the Data Warehouse Center copies the sources that the step uses and generates a target table.

To define the Product dimension table:

  1. Define a new process by following the steps in Defining a process. Name the process Tutorial Product.
  2. From the main Data Warehouse Center window, expand the tree until you see the Sample Product process.
  3. Right-click the Sample Product process.
  4. Click Open.

    The Process Model window opens.

  5. Right-click the Select Product step.
  6. Click Copy.

    The Copy Step window opens.

  7. In the Name field, type the name for the copy of the step:
    Tutorial Select Product
    
  8. In the New process field, select or type the name of the process to which you want to copy the step:
    Tutorial Product
    
  9. Verify that the Copy Target Table check box is selected.
  10. In the Warehouse target field, select the Tutorial Targets warehouse target from the list.
    The GEOGRAPHIES SELECT statement on the SQL Statement page.
  11. Click OK.

    The step and its sources are copied to the Tutorial Product process. The Data Warehouse Center generates the corresponding target table.

  12. Open the Tutorial Product process by following the steps in Opening the process.
  13. Verify that the process includes the following objects:
  14. Rename the TARGET_PRODUCT target table to LOOKUP_PRODUCT.

Repeat this procedure for the rest of the dimension tables and the fact table.
Dimension Tutorial Process Sample Process Sample Step Tutorial Step Warehouse Target Source tables Target Table New Target Table Name
Time Tutorial Time Sample Time Select Time Tutorial Select Time Tutorial Targets TIME
TARGET_
TIME


LOOKUP_
TIME

Scenario Tutorial Scenario Sample Scenario Select Scenario Tutorial Select Scenario Tutorial Targets SCENARIO
TARGET
_SCENARIO


LOOKUP_
SCENARIO

Fact Table Tutorial Fact Table Sample Fact Table Fact Table Join Tutorial Fact Table Join Tutorial Targets
SALES,
INVENTORY,
and
PRODUCT
_COSTS


TARGET_
FACT_
TABLE


FACT_
TABLE


What you just did

You now have five processes, and their associated sources, targets, and steps, defined in the Data Warehouse Center:


[ Top of Page | Previous Page | Next Page ]