Use the Chi-Square transformer to perform the chi-square test and the chi-square goodness-of-fit test on columns of numerical data. These tests are nonparametric tests.
You can use the statistical results of these tests to make the following determinations:
Whether the values of one variable are related to the values of another variable
Whether the values of one variable are independent of the values of another variable
Whether the distribution of variable values meets your expectations
Use these tests with small sample sizes or when the variables that you are considering might not be normally distributed. Both the chi-square test and the chi-square goodness-of-fit test make the best use of data that cannot be precisely measured.
The Chi-square transformer optionally produces an additional output table called the Expected value output table. You can select a table to be used as the Expected value output table, or you can specify that will not be produced.
When you set up this process in the Process Modeler, link the Chi-Square step to a Warehouse target table. If you want the step to produce the Expected value output table, link the step to a second Warehouse target table in the same database.
You can make changes to the step definition only when the step is in Development mode.
To define a Chi-square transformer:
Specify information for your step:
In the Name field, you can type a new name for the step. Otherwise, you can keep the name that the Data Warehouse Center automatically supplied for the step. This field is required.
In the Administrator field, type the name of the person who is responsible for the maintenance of this step. This field is optional.
In the Description field, type a business description for your step. This description can be a maximum of 255 characters. This field is optional.
In the Notes field, type detailed information that might be helpful to users who can access this step. This field is optional.
In the Available columns list on the Parameters page, click a column. Then, click > next to the Column of row definition field. This field is required for both goodness-of-fit calculation and chi-square calculations. If you want your step to run as a chi-square calculation, go to step 4. Otherwise, go to step 5. Your step will run as a goodness-of-fit calculation.
To define a chi-square calculation, click a column in the Available columns list and click > next to the Column of column names field.
In the Available columns list, click a column that contains the observed frequencies data. This column must be of numeric type. Then, click > next to the Observed frequencies column field.
In the Available columns list, click a column that contains expected frequency data. This column must be of numeric type. Then, click > next to the Expected frequencies column field. This field is required for goodness-of-fit calculations and optional for chi-square calculations.
In the Expected values output table list, select a target table for the expected values output table. Depending on certain conditions, this field is either optional or required:
This field is optional if you have only one target table linked to the chi-square step in the Process Modeler. If you want to create an expected values output table, select your target table. Then, click OK to save and close the step. Next, in the Process Modeler, link a second table to the chi-square step to contain your regular chi-square output. Finally, open the chi-square step and continue defining values for the transformer.
This field is optional for chi-square calculations.
This field is required if you have two tables linked to the chi-square step in the Process Modeler. Select one of the tables to be the expected values output table.
This field is required for goodness-of-fit calculations.
Optional: On the Column Mapping page, you can view the mappings between the output columns that result from the transformations that you defined on the Parameters page and the columns on your target table. You cannot change these mappings.
If the output table is not used by any steps that are in Test or Production, you can rename target columns. To rename a target column, double-click the column name and type the new name.
On the Processing Options page, in the Agent Site list, select an agent site where you want your step to run. The selections in this list are agent sites that are common to the source tables, the target tables, and the transformer or program that you are defining.
If you want to have the option to to run your step at any time, select the Run on demand checkbox. Your step must be in test or production mode before you can run it.
Optional: Select the Populate externally check box if the step is populated externally, meaning that it is invoked in some way other than by the Data Warehouse Center. The step does not have to have any other means of running in the Data Warehouse Center in order to change the mode to production.
If Populate externally is not selected, then the step must either have a schedule, be linked to a transient table that is input to another step, or be started by another program in order to change the mode to production.
In the Retry area, specify how many times you want the step to run again if it needs to be retried and the amount of time that you want to pass before the next run of the step.
In the Log table field, specify a log table.
Optional: In the Trace level field, specify a trace level.
Click OK to save your changes and close the step notebook.
Related information