Business Intelligence Tutorial

Defining data transformation and movement

In this lesson, you will define how the Data Warehouse Center is to move and transform data into a format for the data warehouse. First, you will define a process, which contains a series of steps in the data transformation and movement process. Next, you will specify the source tables that are to be transformed for the warehouse. Finally, you will define data transformation steps that use two different methods of transformation:

Specifically, you will define the Build Tutorial Market Dimension process, which performs the following processing:

  1. Loading the Demographics file into the warehouse database.
  2. Selecting data from the GEOGRAPHIES table and creating a target table.
  3. Joining the data in the Demographics table and in the GEOGRAPHIES target table.

The end result is the LOOKUP_MARKET target table.


Defining a process

In this exercise, you will define the process object for the Build Tutorial Market Dimension process.

To define the process object:

  1. From the Data Warehouse Center window, expand the Subject Areas tree.
  2. Expand the TBC Tutorial subject area, which you defined in Defining a subject area.
  3. Right-click the Processes folder, and click Define.

    The Define Process notebook opens.

  4. In the Name field, type the name of the process:
    Build Tutorial Market Dimension
    

    The name can be up to 80 characters long and is case-sensitive. The first character of the name must be alphanumeric. You cannot use & as the first character in English. This field is required.

  5. In the Administrator field, type your name as the contact for the process definition.
  6. In the Description field, type the description of the process:
    Process to create the LOOKUP_MARKET table
    
  7. Click the Security tab.
  8. In the Available warehouse groups list, click the Tutorial Warehouse Group, which you defined in Defining the warehouse group.
  9. Click > to move the Tutorial Warehouse Group to the Selected warehouse groups list.

    Adding the process to the warehouse group authorizes the users in the group (in this case, you) to open and add objects to the process.

  10. Click OK.

    The Define Process notebook closes.


Opening the process

In this exercise, you will open the process so that you can graphically define the flow of data.

To open the Build Tutorial Market Dimension process:

  1. Expand the TBC Tutorial tree until you see the Build Tutorial Market Dimension process.
  2. Right-click the Build Tutorial Market Dimension process.
  3. Click Open.

Adding tables to a process

To define the flow of data, you need to add each source that the steps transform and the target tables that result from the transformation.

In the Build Tutorial Market Dimension process, you will load the Demographics file into the target database. You need to add to the process the source file and the DEMOGRAPHICS_TARGET table for the step. The Demographics source file is part of the Tutorial File Source warehouse source, which you defined in Defining warehouse sources. The DEMOGRAPHICS_TARGET table is part of the Tutorial Targets warehouse target, which you defined in Defining warehouse targets.

To add the Demographics file:

  1. Click the Add Data icon.
    Figure alldata not displayed.
  2. Click the canvas (the area on the right side of the window) at the spot where you want to place the table. The Add Data window opens.
  3. In the Available source and target tables list, expand the Warehouse Sources tree.

    A list of the available warehouse sources is displayed.

  4. Expand the tree for the Tutorial File Source warehouse source.
  5. Expand the Files tree.

    In the tree, you should see the following file:

    X:\program files\sqllib\samples\db2sampl\dwc\demographics.txt, where X is the drive on which you installed the sample.

  6. Select the Demographics file.
  7. Click > to add the Demographics file to the Selected source and target tables list.

To add the DEMOGRAPHICS_TARGET table:

  1. In the Available source and target tables list, expand the Warehouse Targets tree.

    A list of the available warehouse targets is displayed.

  2. Expand the Tutorial Targets warehouse target tree.
  3. Expand the Tables tree.

    The DEMOGRAPHICS_TARGET table is displayed in the list.

  4. Select the DEMOGRAPHICS_TARGET table.
  5. Click > to add the DEMOGRAPHICS_TARGET table to the Selected source and target tables list.

Adding the SAMPLETBC.GEOGRAPHIES table to the process

In the next part of this exercise, you need to add the source table. When you define a step that selects data from the SAMPLTBC.GEOGRAPHIES table, you can specify that the Data Warehouse Center automatically generate a target table, so you do not need to add a target table.

To add the SAMPLTBC.GEOGRAPHIES source table:

  1. Expand the Warehouse Sources tree until you see the Tutorial Relational Source warehouse source.
  2. Expand the Tutorial Relational Source tree until you see the SAMPLTBC.GEOGRAPHIES table.
  3. Select the SAMPLTBC.GEOGRAPHIES table.
  4. Click > to add the SAMPLTBC.GEOGRAPHIES table to the Selected source and target tables list.
  5. Click OK to add the file and tables that you selected to the process.

    The file and tables that you selected are displayed in the Process Model window.

  6. Click the demographics.txt icon and move it up until you see the DEMOGRAPHICS_TARGET table.
  7. Arrange the icons in the following order from top to bottom: demographics.txt, DEMOGRAPHICS_TARGET. Leave room for one icon between demographics.txt and DEMOGRAPHICS_TARGET. Place the Geographies icon to the right of the demographics.txt icon.

Figure db2tu063 not displayed.

Adding steps to the process

Now, you need to add the steps that define how the source data is to be transformed into the target data. You must define three steps:

Load Demographics Data
A DB2 program step that loads data from the Demographics file into a table in the TBC Warehouse database, which you created in Creating a warehouse database.

Select Geographies Data
An SQL step that selects columns from the SAMPLTBC.GEOGRAPHIES source table.

Join Market Data
An SQL step that joins the SAMPLTBC.GEOGRAPHIES and Demographics tables and writes the joined table into the Tutorial Warehouse database.

Defining the Load Demographics Data step

To define the Load Demographics Data step:

  1. From the palette (the toolbar on the left side of the window), click the DB2 Programs icon
    Figure db2prog not displayed.

    Each program option in the menu represents a program group, which is a grouping of similar programs.

  2. Click DB2 UDB --> Load.
  3. Click the spot on the canvas between the demographics.txt file and the DEMOGRAPHICS_TARGET table.

    An icon for the step is added to the window between the demographics.txt file and the DEMOGRAPHICS_TARGET table.

  4. Right-click the new step.
  5. Click Properties.

    The Properties notebook for the step opens.

  6. In the Name field, type the name of the step:
    Load Demographics Data
    
  7. In the Administrator field, type your name as the contact for this step.
  8. In the Description field, type the description of the step:
    Loads demographics data into the warehouse.
    
  9. Click OK.

    The Properties notebook for the step closes.

  10. Click the Task Flow icon
    The task flow arrow.
  11. Click the Data Link icon
    The data link arrow.

    The Data Link icon defines the flow of data from the source file, through transformation by a step, to the target table.

  12. Click the middle of the demographics.txt file, hold down the mouse button, and drag to the Load Demographics Data step.

    The Data Warehouse Center draws a line between the file and the step. The line indicates that the demographics.txt file contains the source data for the step.

  13. Click the middle of the Load Demographics Data step, hold down the mouse button, and drag the pointer to the DEMOGRAPHICS_TARGET table.


    A link between the Demographics source file and the Load Demographics Step

    This line indicates that the DEMOGRAPHICS_TARGET table contains the target data for the step.

  14. Right-click the Load Demographics Data step.
  15. Click Properties.

    The Properties notebook for the step opens.

  16. Click the Parameters tab.
    The Parameters page of the Properties notebook for the step.
  17. From the Load mode list, click REPLACE.

    For this exercise, you will use the default values on the page.

    You can skip the Column Mapping page because the Load program does not use the column mapping to generate the target table. It generates the target table based on the source file.

  18. Click OK.

    The Properties notebook for the step closes.

Defining the Select Geographies step

In this exercise, you will define the Select Geographies step:

  1. From the palette, click the SQL icon
    Figure sql not displayed.
  2. Click a spot on the canvas below the GEOGRAPHIES table to add a step icon to the window.
  3. Right-click the step.
  4. Click Properties.

    The Properties notebook for the step opens.

  5. In the Name field, type the name of the step:
    Select Geographies Data
    
  6. In the Administrator field, type your name as the name of the contact for the step.
  7. In the Description field, type the description of the step:
    Selects Geographies data from the warehouse source
    
  8. Click OK.

    The Properties notebook for the step closes.

  9. Click the Task Flow icon
    Figure arrowcol not displayed.
  10. Click the Data Link icon
    Figure arrowgry not displayed.
  11. Click the middle of the Geographies source table, hold down the mouse button, and drag it to the middle of the Select Geographies Data step.

    The Data Warehouse Center draws a line that indicates that the Geographies source table contains the source data for the step.

    You do not need to link a target table to the step because you will use the Data Warehouse Center to create a target table.

Selecting columns from the Geographies source table

To select columns from the Geographies source table:

  1. Right-click the Select Geographies Data step.
  2. Click Properties.

    The Properties notebook for the step opens.

  3. Click the SQL Statement tab.
  4. Click the Build SQL push button.

    The Build SQL wizard opens. The Tables page is displayed.

  5. Verify that SAMPLTBC.GEOGRAPHIES is listed.
    Figure db2tu016 not displayed.
  6. Click the Columns tab.
  7. Click >> to add all the columns from the SAMPLTBC.GEOGRAPHIES table to the Selected columns list.
    The Columns page of the Build SQL wizard
  8. Click the Review tab to look at the SQL statement that you just created.
  9. Click OK.

    The Build SQL wizard closes. The SQL statement that you created is displayed on the SQL Statement page.
    The GEOGRAPHIES SELECT statement on the SQL Statement page.

  10. Click Test to test the SQL that you just created.

    The Data Warehouse Center returns sample results of your SELECT statement. Compare your results to the results that you got in the sample shown in the topic, Viewing table data.

  11. Click Close to close the window.

    Now that you have specified which columns the SQL step will access in the source table, you are ready to create the target table. Leave the Properties notebook for the step open and go to the exercise, Creating the GEOGRAPHIES_TARGET table.

Creating the GEOGRAPHIES_TARGET table

In this exercise, you will create the GEOGRAPHIES_TARGET table using the Column Mapping page in the Select Geographies Data step.

To create the GEOGRAPHIES_TARGET table:

  1. Click the Column Mapping tab.

    The Source columns are displayed on the left side of the page, and the target columns list is on the right side of the page. There are no target columns in the list because you did not specify the parameters for creating the target table.

  2. Click the Generate Default Table push button.

    The Generate Default Table window opens.

  3. In the Warehouse target list, click Tutorial Targets.

    The warehouse target is the database or file system in which the target table is created.

  4. In the Table schema list, type or click the table schema for the target table. Use the same schema under which you created the other tables.

    Because you are creating the table in the default table space, you can skip the Table space list.

  5. In the Table name field, type:
    GEOGRAPHIES_TARGET
    
  6. Click OK to close the Generate Default Table window.

    The columns of the GEOGRAPHIES_TARGET table are displayed in the target columns list on the right side of the Column Mapping page.

  7. Verify that the source columns map to the correct target columns.
    Figure db2tu019 not displayed.
  8. Click the Processing Options tab.
  9. Verify that Replace is selected in the Population type list.
  10. Verify that the Run on demand check box is selected.

    For this exercise, you will use the default values on this page.

  11. Click OK.

    The Properties notebook for the step closes. The Data Warehouse Center creates a target table called GEOGRAPHIES_TARGET and links it to the step.

Specifying properties for the GEOGRAPHIES_TARGET table

In this exercise, you will specify the properties for the GEOGRAPHIES_TARGET table that you created in the exercise, Creating the GEOGRAPHIES_TARGET table.

To specify the properties of the GEOGRAPHIES_TARGET table:

  1. In the Process Model window, right-click the GEOGRAPHIES_TARGET table, and click Properties.
  2. In the Business name field, type a descriptive name for the table:
    Geographies Target
    
  3. Clear the Part of an OLAP schema check box.
  4. Click OK. The Properties notebook for the table closes.

Defining the Join Market Data step

To define the Join Market Data step:

  1. From the palette, click the SQL icon.
  2. Click the spot on the canvas below the two columns of steps and tables.

    An icon for the step is added to the window.

  3. Right-click the new step.
  4. Click Properties.

    The Properties notebook for the step opens.

  5. In the Name field, type the name of the step:
    Join Market Data
    
  6. In the Administrator field, type your name as the contact for the step.
  7. In the Description field, type the description of the step:
    Joins the Geographies table with the Demographics table
    
  8. Click OK.

    The Properties notebook for the step closes.

  9. Click the Task Flow icon
    Figure arrowcol not displayed.
  10. Click the Data Link icon
    Figure arrowgry not displayed.
  11. Click the middle of the GEOGRAPHIES_TARGET table, hold the mouse button down, and drag it to the Join Market Data step.

    The Data Warehouse Center draws a line that indicates that the GEOGRAPHIES_TARGET table contains source data for the step.

    Because you will specify that the Data Warehouse Center is to create the target table, you do not need to link a target table to the step.

  12. Repeat step 11 with the DEMOGRAPHICS_TARGET table and the Join Market Data step.

    The Data Warehouse Center draws a line that indicates that the DEMOGRAPHICS_TARGET table contains source data for the step.

Joining the DEMOGRAPHICS_TARGET and GEOGRAPHIES_TARGET tables

To join the DEMOGRAPHICS_TARGET and GEOGRAPHIES_TARGET tables:

  1. Right-click the Join Market Data step.
  2. Click Properties.

    The Properties notebook for the step opens.

  3. Click the SQL Statement tab.
  4. Click the Build SQL push button to have the Data Warehouse Center create SQL. (Otherwise, you can create your own SQL.)

    The Build SQL window opens.

  5. On the Tables page, verify that the DEMOGRAPHICS_TARGET and GEOGRAPHIES_TARGET tables are listed.
  6. Click the Columns tab.
    The GEOGRAPHIES SELECT statement on the SQL Statement page.
  7. Click >> to add all the columns from the GEOGRAPHIES_TARGET table and the DEMOGRAPHICS_TARGET table to the Selected columns list.
  8. From the Selected columns list, click DEMOGRAPHICS_TARGET.STATE.
  9. Click < to move the DEMOGRAPHICS_TARGET.STATE column to the Available columns list.
  10. Click DEMOGRAPHICS_TARGET.CITY.
  11. Click < to move the DEMOGRAPHICS_TARGET.CITY column to the Available columns list.

    You do not need the DEMOGRAPHICS_TARGET.STATE and DEMOGRAPHICS_TARGET.CITY columns because they are already defined in the GEOGRAPHIES_TARGET table. You will join the tables using the CITY column because it is unique to both tables.

  12. Click the Add push button.

    The Expression Builder window opens.

  13. In the Expression field, type the following CASE statement:
    case 
    when POPULATION < 9000000   then 1
    when POPULATION < 15000000  then 2
    Else				 3
    end
    

    Figure db2tu062 not displayed.
  14. Click OK.

    A new column is added to the Selected columns list.

  15. Click the Name field of the new column, and type the name of the column:
    SIZE_ID
    
  16. Press Enter.
  17. Click Move Up until the SIZE_ID column is above the POPULATION column.
  18. Click the Joins tab.
  19. In the GEOGRAPHIES_TARGET table, click the CITY column.
  20. In the DEMOGRAPHICS_TARGET table, click the CITY column.
  21. Click Join.

    The Build SQL wizard draws a line between the CITY columns, which indicates that the tables are joined on that column.


    The GEOGRAPHIES SELECT statement on the SQL Statement page.

  22. Click the Review tab to see the SQL statement that you just created.
  23. Click OK.

    The Build SQL wizard closes.

Creating the LOOKUP_MARKET table

To create the LOOKUP_MARKET table:

  1. Click the Column Mapping tab.
  2. Click the Generate Default Table push button.

    The Generate Default Table opens.

  3. In the Warehouse target list, click Tutorial Targets.
  4. In the Table schema list, type or click IWH. Skip the Table space list.
  5. In the Table name field, type:
    LOOKUP_MARKET
    
  6. Click OK.

    The Generate Default Table window closes. The target columns are displayed in the target columns list on the right side of the Column Mapping page.

  7. Click the Processing Options tab.
  8. In the Population type list, verify that Replace is selected.
  9. Select the Run on demand check box to run the step on demand.

    For this exercise, you will use the default values for the rest of this page.

  10. Click OK.

    The Properties notebook for the step closes, and the LOOKUP_MARKET table is displayed in the Process Model window.

Defining the properties of the LOOKUP_MARKET table

To define the properties of the LOOKUP_MARKET table:

  1. Right-click the LOOKUP_MARKET table, and click Properties.
  2. Click Properties.

    The Properties notebook for the table opens.

  3. In the Description field, type a description of the table:
    Market dimension data
    
  4. Verify that the Part of an OLAP schema check box and Dimension table radio button are selected.

    The LOOKUP_MARKET table is one of the dimension tables that you will include in a star schema in Creating a star schema from within the Data Warehouse Center.

  5. Click the Columns tab.
  6. Clear the Nullable check box for the CITY_ID column. You will define this column as the primary key for the table in Defining keys on target tables.
  7. Click the Warehouse Primary Key tab.
  8. Click CITY_ID in the Available columns list.
  9. Click > to move the CITY_ID column to the Warehouse primary key columns list.

  10. In the Constraint name field, type:
    "Whse Market PK"
    
  11. Click OK. The Properties notebook for the table closes.

    Your process now looks like this:


    Figure db2tu042 not displayed.

  12. Click the Save icon in the toolbar to save the process.
    Figure wc_save not displayed.
  13. Close the Process Model window.

What you just did

You defined the LOOKUP_MARKET dimension table in the Build Tutorial Market Dimension process, which contains three steps:

The sources and targets for each of the steps are shown in the following table:


Step Sources Target
Load Demographics Data Demographics file DEMOGRAPHICS_TARGET table
Select Geographies Data Geographies table GEOGRAPHIES_TARGET table
Join Market Data Demographics Target table and Geographies Target table LOOKUP_MARKET table

You added the data links for each step when you defined the properties of each step. Another way you can accomplish this task is to add all the steps in the process at one time, link the steps to their sources and targets, and then define the properties of each step. The Data Warehouse Center assigns default names to the steps, which you can change in the Properties notebook for the step.


Defining the rest of the tables for the star schema (optional)

The table that you created in the previous exercise, LOOKUP_MARKET, is one of the dimension tables described in the example in Tutorial business problem. This exercise covers the steps for building the rest of the dimension tables and the fact table in the star schema.

You need to complete the steps in this exercise before you can do the following lessons:

If you want to skip this exercise, go to Testing warehouse steps.

When you define each table, you must define a new process for the table. Instead of defining a step for the process, you will copy the step that is defined in the sample. When you copy the step, the Data Warehouse Center copies the sources that the step uses and generates a target table.

To define the Product dimension table:

  1. Define a new process by following the steps in Defining a process. Name the process:
    Build Tutorial Product Dimension
    
  2. Under the Subject Areas folder in the Data Warehouse Center window, expand the TBC Sample tree until you see the Build Sample Product Dimension process.
  3. Right-click the Build Sample Product Dimension process.
  4. Click Open.

    The Process Model window opens.

  5. Right-click the Select Product step.
  6. Click Copy.

    The Copy Step window opens.

  7. In the Name field, type the name for the copy of the step:
    Tutorial Select Product
    
  8. In the New process field, click the name of the process to which you want to copy the step:
    Build Tutorial Product Dimension
    
  9. Verify that the Copy Target Table check box is selected.
  10. In the Warehouse target field, click the Tutorial Targets warehouse target in the list. The warehouse target will be the same for the fact table and each dimension table that you define in this tutorial.
    The GEOGRAPHIES SELECT statement on the SQL Statement page.
  11. Click OK.

    The step and its sources are copied to the Build Tutorial Product Dimension process. The Data Warehouse Center creates the corresponding target table.

  12. Close the Process Model window.
  13. Open the Build Tutorial Product Dimension process by following the steps in Opening the process.
  14. Verify that the process includes the following objects:
  15. Rename the "SelectProd_T" target table to:
    LOOKUP_PRODUCT
    

    To rename the target table:

    1. In the Process Model window, right-click the "SelectProd_T" target table.
    2. Click Properties.

      The Properties notebook for the table opens.

    3. In the Table name field, type:

      LOOKUP_PRODUCT
      
    4. Verify that the Part of an OLAP schema check box and Dimension table radio button are selected.
    5. Click OK to save your changes and close the Properties notebook for the table.

Repeat this procedure for the Time and Scenario dimension tables, substituting the values in the following table where appropriate.
Dimension Tutorial process Sample process Sample step to copy New tutorial step name Source tables Target table New target table name
Time Build Tutorial Time Dimension Build Sample Time Dimension Select Time Tutorial Select Time TIME "SelectTime
_T"
LOOKUP_
TIME
Scenario Build Tutorial Scenario Dimension Build Sample Scenario Dimension Select Scenario Tutorial Select Scenario SCENARIO "SelectScen
_T"
LOOKUP_
SCENARIO

In the previous exercise, you defined the dimension tables in the star schema. In this exercise, you will define the fact table in the star schema.

To define the fact table:

  1. Define a new process by following the steps in Defining a process. Name the process:
    Build Tutorial Fact Table
    
  2. Under the Subject Areas folder in the Data Warehouse Center window, expand the TBC Sample tree until you see the Build Sample Fact Table process.
  3. Right-click the Build Sample Fact Table process.
  4. Click Open.

    The Process Model window opens.

  5. Right-click the Fact Table Join step.
  6. Click Copy.

    The Copy Step window opens.

  7. In the Name field, type the name for the copy of the step:
    Tutorial Fact Table Join
    
  8. In the New process field, select the name of the process to which you want to copy the step:
    Build Tutorial Fact Table
    
  9. Verify that the Copy Target Table check box is selected.
  10. In the Warehouse target field, select the Tutorial Targets warehouse target from the list.
  11. Click OK.

    The step and its sources are copied to the Build Tutorial Fact Table process. The Data Warehouse Center generates the corresponding target table.

  12. Close the Process Model window.
  13. Open the Build Tutorial Fact Table process by following the steps in Opening the process. If it is already open, close it, then open it again.
  14. Verify that the process includes the following objects:
  15. Rename the "FactTable_T" target table:
    FACT_TABLE
    

    To rename the target table:

    1. In the Process Model window, right-click the "FactTable_T" target table.
    2. Click Properties.

      The Properties notebook for the table opens.

    3. In the Table name field, type:

      FACT_TABLE
      
    4. Verify that the Part of an OLAP schema check box and the Fact table radio button are selected.
    5. Click OK to save your changes and close the Properties notebook for the table.

What you just did

You now have five processes, with their associated sources, targets, and steps, defined in the Data Warehouse Center:


[ Top of Page | Previous Page | Next Page | Table of Contents ]