Determining the linear relationship between variables in two columns

Use the Correlation transformer to determine the extent to which changes in the value of an attribute (such as length of employment) are associated with changes in another attribute (such as salary). The data for a correlation analysis consists of two input columns. Each column contains values for one of the attributes of interest. The Correlation transformer can calculate various measures of association between the two input columns. You can select more than one statistic to calculate for a given pair of input columns.

The data in the input columns also can be treated as a sample obtained from a larger population, and the Correlation transformer can be used to test whether the attributes are correlated in the population. In this context, the null hypothesis asserts that the two attributes are not correlated, and the alternative hypothesis asserts that the attributes are correlated.

Your source table and target table must exist in the warehouse database. This transformer can create a target table in the same warehouse database that contains the source, if you want it to.

You can make changes to the step only when the step is in Development mode.

../byb.gif Authorities and privileges

../rule.gif

To define a Correlation transformer step:

  1. Open the step notebook.

  2. Specify information for your step:

  3. Click the Parameters tab.

  4. Optional: Click columns that you want to use as grouping columns and click >. Grouping columns can contain character or numeric data.

  5. Define correlation statistics.

  6. On the Column Mappings page, map the columns that result from your correlation statistics to columns in your target table.

    The column names for your correlation statistics are based on the data column entries that you select on the Parameters page and the statistic that you select for it. A column is created for each statistic that is selected and its corresponding data columns. For example, if your data columns, Salary and Employment, have the correlation statistics Covariance and T-value defined to them, the columns Covariance_Salary_Employment and T-value_Salary_Employment will be displayed on the Column Mappings page. Output columns are listed on the left side of the page, under the heading "Source Columns". Target columns from the output table linked to the step are listed on the right side of the page. Use the Column Mapping page to perform the following tasks:

    If the Parameters page produces no output columns, or if this step is not linked to a target table and you have not specified automatic generation of a default table in the Parameters page, you will not be able to use this page to map your columns. Some steps will not allow you to change the column mapping.

  7. On the Processing Options page, in the Agent Site list, select an agent site where you want the step to run. The selections in this list are agent sites that are common to the source tables, the target table, and the transformer or program that you are defining.

  8. If you want to have the option to to run your step at any time, select the Run on demand checkbox. Your step must be in test or production mode before you can run it.

  9. Optional: Select the Populate externally check box if the step is populated externally, meaning that it is invoked in some way other than by the Data Warehouse Center. The step does not have to have any other means of running in the Data Warehouse Center in order to change the mode to production.

    If Populate externally is not selected, then the step must either have a schedule, be linked to a transient table that is input to another step, or be started by another program in order to change the mode to production.

  10. In the Retry area, specify how many times you want the step to run again if it needs to be retried and the amount of time that you want to pass before the next run of the step.

  11. In the Log table field, specify a log table.

  12. In the Trace level field, specify a trace level.

  13. Click OK to save your changes and close the step notebook.

Related information

Moving and transforming data

Population type descriptions

List of steps and step subtypes

Data Warehouse Center concepts