Use the Clean Data transformer to perform rules-based find and replace operations on a table. The transformer finds values that you specify in the data columns of the source table that your step accesses. Then the transformer updates corresponding columns with replacement values that you specify in the table to which your step writes. You can select multiple columns from the input table to carry over to the output table. The Clean Data transformer does not define rules or parameters for the carry over columns.
You can use the Clean Data transformer only if your source table and target table are in the same database. The source table must be a single warehouse table. The target table is the default step table.
Use the Clean Data transformer to clean and standardize data values after load or import, as part of a process.
Prerequisite: Before you can use the Clean Data transformer, you must create a rules table for your clean type. A rules table designates the values that the Clean Data transformer will use during the find and replace process. The rules table must be in the same database as the source table and target table.
You can make changes to the step only when the step is in Development mode.
To define a Clean Data transformer step:
Specify information for your step:
In the Name field, you can type a new name for the step. Otherwise, you can keep the name that the Data Warehouse Center automatically supplied for the step. This field is required.
In the Administrator field, type the name of the person who is responsible for the maintenance of this step. This field is optional.
In the Description field, type a business description for your step. This description can be a maximum of 255 characters. This field is optional.
In the Notes field, type detailed information that might be helpful to users who can access this step. This field is optional.
In the Update Security Group drop-down list, you can authorize access to the step. This list contains only security groups that your current user ID belongs to. This field is optional.
On the Parameters page, define the clean transformations that you want performed when the step runs.
On the Column Mapping page, map the output columns that result from the transformations that you defined on the Parameters page to columns on your target table. On this page, output columns from the Parameters page are referred to as source columns. Source columns are listed on the left side of the page. Target columns from the output table linked to the step are listed on the right side of the page. Use the Column Mapping page to perform the following tasks:
To create a mapping, click a source column and drag the mouse to a target column. Then, release the mouse button. An arrow is drawn between the source column and the target column.
To delete a mapping, right-click an arrow and select Delete.
If the output table is not used by any steps that are in Test or Production, you can change the attributes of the target column. To rename a target column, double-click the column name and type the new name. You can also modify any other attributes of the target column by double-clicking the attribute.
To move a target column up or down the list, select the column. Then, click the Up arrow or Down arrow buttons. If the target column is mapped to a source column, the mapping remains intact.
If the Parameters page produces no output columns, or if this step is not linked to a target table and you have not specified automatic generation of a default table in the Parameters page, you will not be able to use this page to map your columns. Some steps will not allow you to change the column mapping.
On the Processing Options page, in the Population type list, specify whether the step replaces or appends data in the target. The default is replace.
In the Agent Site list, select an agent site where you want your step to run. The selections in this list are agent sites that are common to the source tables, the target table, and the transformer or program that you are defining.
If you want to have the option to to run your step at any time, select the Run on demand checkbox. Your step must be in test or production mode before you can run it.
Optional: Select the Populate externally check box if the step is populated externally, meaning that it is invoked in some way other than by the Data Warehouse Center. The step does not have to have any other means of running in the Data Warehouse Center in order to change the mode to production.
If Populate externally is not selected, then the step must either have a schedule, be linked to a transient table that is input to another step, or be started by another program in order to change the mode to production.
In the Retry area, specify how many times you want the step to run again if it needs to be retried and the amount of time that you want to pass before the next run of the step.
In the Log table field, specify a log table.
Optional: In the Trace level field, specify a trace level.
Click OK to save your changes and close the step notebook.
Related information