Use the Regression transformer to show the relationships between two different variables and to show how closely they are correlated. You can use this transformer to show the effect of a change in pricing on demand for a product, to show the effect of location on the response to advertising, or to show how closely two seemingly random sets of data are related.
This transformer performs a backward, full-model regression. This method starts with all independent variables in a model, but removes the least-important independent variables one at a time until only significant independent variables remain in the model.
The Regression transformer produces two additional output tables: the ANOVA summary table and the Equation variable table.
Before you begin this task, you must link this step in the Process Modeler to a Warehouse source table and three Warehouse target tables. Or, you can link the step to a source and specify that the step create the target tables. The tables must exist in the same database. The Regression transformer writes the results from the Regression transformation to a table on one Warehouse target, and creates the ANOVA summary table and the Equation variable table on the second and third targets.
You can make changes to the step only when the step is in Development mode.
To define a Regression transformer:
Specify information for your step:
In the Name field, you can type a new name for the step. Otherwise, you can keep the name that the Data Warehouse Center automatically supplied for the step. This field is required.
In the Administrator field, type the name of the person who is responsible for the maintenance of this step. This field is optional.
In the Description field, type a business description for your step. This description can be a maximum of 255 characters. This field is optional.
In the Notes field, type detailed information that might be helpful to users who can access this step. This field is optional.
On the Parameters page, select columns from the Available columns list that contain independent variable data used for prediction in the Regression calculations. Then, click > next to the Predictor columns list. The columns are added to the Predictor columns list. Only columns of numeric data type are available.
From the Available columns, click one column that contains the dependent variable data to use for criteria in the regression calculations. Then, click > next to the Criterion column field. Only columns of numeric data type are available.
In the Summary table list, select a target table to be your ANOVA summary table.
In the Equation variable table list, select a target table to be your Equation variable table.
Optional: On the Column Mapping page, you can view the mappings between the output columns that result from the transformations that you defined on the Parameters page and the columns on your target table. You cannot change these mappings.
If the output table is not used by any steps that are in Test or Production, you can rename target columns. To rename a target column, double-click the column name and type the new name.
On the Processing Options page, in the Agent Site list, select an agent site where you want your step to run. The selections in this list are agent sites that are common to the source tables, the target tables, and the transformer or program that you are defining.
If you want to have the option to to run your step at any time, select the Run on demand checkbox. Your step must be in test or production mode before you can run it.
Optional: Select the Populate externally check box if the step is populated externally, meaning that it is invoked in some way other than by the Data Warehouse Center. The step does not have to have any other means of running in the Data Warehouse Center in order to change the mode to production.
If Populate externally is not selected, then the step must either have a schedule, be linked to a transient table that is input to another step, or be started by another program in order to change the mode to production.
In the Retry area, specify how many times you want the step to run again if it needs to be retried and the amount of time that you want to pass before the next run of the step.
In the Log table field, specify a log table.
Optional: In the Trace level field, specify a trace level.
Click OK to save your changes and close the step notebook.
Related information