Use the Correlation transformer to determine the extent to which changes in the value of an attribute (such as length of employment) are associated with changes in another attribute (such as salary). The data for a correlation analysis consists of two input columns. Each column contains values for one of the attributes of interest. The Correlation transformer can calculate various measures of association between the two input columns. You can select more than one statistic to calculate for a given pair of input columns.
The data in the input columns also can be treated as a sample obtained from a larger population, and the Correlation transformer can be used to test whether the attributes are correlated in the population. In this context, the null hypothesis asserts that the two attributes are not correlated, and the alternative hypothesis asserts that the attributes are correlated.
Your source table and target table must exist in the warehouse database. This transformer can create a target table in the same warehouse database that contains the source, if you want it to.
You can make changes to the step only when the step is in Development mode.
To define a Correlation transformer step:
Specify information for your step:
In the Name field, you can type a new name for the step. Otherwise, you can keep the name that the Data Warehouse Center automatically supplied for the step. This field is required.
In the Administrator field, type the name of the person who is responsible for the maintenance of this step. This field is optional.
In the Description field, type a business description for your step. This description can be a maximum of 255 characters. This field is optional.
In the Notes field, type detailed information that might be helpful to users who can access this step. This field is optional.
Click the Parameters tab.
Optional: Click columns that you want to use as grouping columns and click >. Grouping columns can contain character or numeric data.
On the Column Mappings page, map the columns that result from your correlation statistics to columns in your target table.
The column names for your correlation statistics are based on the data column entries that you select on the Parameters page and the statistic that you select for it. A column is created for each statistic that is selected and its corresponding data columns. For example, if your data columns, Salary and Employment, have the correlation statistics Covariance and T-value defined to them, the columns Covariance_Salary_Employment and T-value_Salary_Employment will be displayed on the Column Mappings page. Output columns are listed on the left side of the page, under the heading "Source Columns". Target columns from the output table linked to the step are listed on the right side of the page. Use the Column Mapping page to perform the following tasks:
To create a mapping, click a source column and drag it to a target column. An arrow is drawn between the source column and the target column.
To delete a mapping, right-click an arrow and select Delete.
If the output table is not used by any steps that are in Test or Production, you can change the attributes of the target column. To rename a target column, double-click the column name and type the new name. You can also modify any other attributes of the target column by double-clicking the attribute.
To move a target column up or down the list, select the column. Then, click the Up arrow or Down arrow buttons. If the target column is mapped to a source column, the mapping remains intact.
If the Parameters page produces no output columns, or if this step is not linked to a target table and you have not specified automatic generation of a default table in the Parameters page, you will not be able to use this page to map your columns. Some steps will not allow you to change the column mapping.
On the Processing Options page, in the Agent Site list, select an agent site where you want the step to run. The selections in this list are agent sites that are common to the source tables, the target table, and the transformer or program that you are defining.
If you want to have the option to to run your step at any time, select the Run on demand checkbox. Your step must be in test or production mode before you can run it.
Optional: Select the Populate externally check box if the step is populated externally, meaning that it is invoked in some way other than by the Data Warehouse Center. The step does not have to have any other means of running in the Data Warehouse Center in order to change the mode to production.
If Populate externally is not selected, then the step must either have a schedule, be linked to a transient table that is input to another step, or be started by another program in order to change the mode to production.
In the Retry area, specify how many times you want the step to run again if it needs to be retried and the amount of time that you want to pass before the next run of the step.
In the Log table field, specify a log table.
In the Trace level field, specify a trace level.
Click OK to save your changes and close the step notebook.
Related information