The Analysis of Variance (ANOVA) transformer produces statistical calculations in two tables based on a small number of parameters. There are three types of ANOVA transformers: one-way ANOVA, two-way ANOVA, and three-way ANOVA.
The ANOVA transformer obtains two independent estimates of variance. The first estimate is based on variability between groups. The second estimate is based on variability within groups. After the ANOVA transformer computes these estimates, it calculates their ratio. The significance of this ratio is described by a family of distributions, the Fisher-F distributions.
The ANOVA transformer also calculates the p-value. The p-value is the probability that the means of the two groups are equal. A small p-value leads to the conclusion that the means are different. For example, a p-value of 0.02 means that there is a 2% chance that the sample means are equal. Likewise, a large p-value does not lead to the conclusion that the means of the two groups are different.
The ANOVA transformer writes to two target tables, which you name. In the following example, table 1 is called the ANOVA Summary table and table 2 is called the ANOVA Statistics table.
The ANOVA Summary table contains the following columns:
Column number |
Column name |
Data type |
Description |
SOURCE |
VARCHAR |
Source of the ANOVA statistic |
|
DF |
Integer |
Degrees of freedom |
|
SS |
Double |
Sum of squares |
|
MS |
Double |
Mean sum of squares |
|
FVALUE |
Double |
F-value or F-ratio |
|
PVALUE |
Double |
Calculated P-value for the F-value and the degrees of freedom |
The ANOVA Statistics table contains the following columns:
Column number |
Column name |
Data type |
Description |
VARIABLE |
VARCHAR |
Name of the variable in the grouping column that produces the statistic |
|
COUNT |
Integer |
Count of the data values |
|
SUM |
Integer |
Sum of the data values |
|
AVERAGE |
Double |
Average of the data values |
|
VARIANCE |
Double |
Variance of the data values |
|
STDDEV |
Double |
Standard deviation of the data values |
The Calculate Statistics transformer calculates the following descriptive statistics on any number of data columns from a single table:
Statistic |
Input data type |
Output data type |
Count1 |
Any data type |
Large integer (COUNT) |
Sum |
A number2 |
Same as the data type of the input except: |
Average |
A number2 |
Same as SUM, except: |
Variance |
A number2 |
Double-precision floating point. |
Standard deviation |
A number2 |
Double-precision floating point. |
Standard error |
A number2 |
Double-precision floating point. |
Minimum |
A number3 |
Same as the data type of the input. |
Maximum |
A number3 |
Same as the data type of the input. |
Range |
A number2 |
Same as the data type of the input. |
Coefficient of variation |
A number2 |
Double-precision floating point. |
The input and output data types are based on the input specification for DB2 UDB SQL functions.
1DB2 UDB provides a function called COUNT_BIG which results in a decimal with precision 31 and scale 0. COUNT_BIG is not supported by the Calculate Statistics transformer.
2A number can be any of these types: SMALLINT, INTEGER, REAL, DOUBLE, FLOAT, DECIMAL, or NUMERIC.
3DB2 UDB provides MIN and MAX functions which can be used for any built-in type other than a long string. The Calculate Statistics transformer MIN and MAX functions will operate only on numbers.
The work area at the right side of the Process Model window. Add sources, targets, and steps to this area to define how data in the open process is to be moved and transformed.
You can use the Clean Data transformer to perform any of the following actions:
Replace missing, invalid, or inconsistent values in selected data columns with appropriate substitute values.
Remove unsuitable data rows.
Clip numeric values.
Perform numeric discretization.
You can choose to ignore case and white space during find comparisons, or to allow a tolerance value for numeric find comparisons.
Each clean transformation uses one of four clean types:
Find and replace - Performs basic find and replace functions.
Discretize - Performs find and replace functions within a range of values.
Clip - Performs find and replace functions within a range of values or outside of a range of values.
Carry over - Specifies columns in the input table to copy to the output table.
Metadata that describes changes to objects in the warehouse. Examples of control metadata are the date and time that a table is updated by the processing of a step.
Use the Copy file using FTP (VWPRCPY) Data Warehouse Center program to copy files on the agent site to and from a remote host.
Before you copy files to MVS, you must allocate their data sets.
You cannot transfer VSAM data sets.
This program uses the following step and source parameters:
Other site
GET/PUT
Remote user ID
Remote password
Remote file
Local file
Binary or character transfer (0=Binary, 1=Character)
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
X |
X |
Log information for program processing is contained in the trcppp.log file.
Log information for the FTP command result is contained in the ftpppp.log file.
The location of the log files is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by this program.
The program can detect only major errors of the FTP command. Confirm
the results of the transfer before you use the transferred data.
Value |
Description |
8 |
Parameter error. |
12 |
FTP error. |
16 |
Internal error. |
32 |
Open Trace error |
128 |
Missing DLL. |
Use the Correlation transformer to determine the extent to which changes in the value of an attribute (such as length of employment) are associated with changes in another attribute (such as salary). The data for a correlation analysis consist of two input columns; each column contains values for one of the attributes of interest. The Correlation transformer can calculate various measures of association between the two input columns. You can select more than one statistic to calculate for a given pair of input columns.
The data in the input columns also can be treated as a sample obtained from a larger population and the Correlation transformer can be used to test whether the attributes are correlated in the population. In this context, the null hypothesis asserts that the two attributes are not correlated, and the alternative hypothesis asserts that the attributes are correlated.
The Correlation transformer calculates any of the following correlation-related statistics on any number of input column pairs:
All Correlation-related statistics require numeric input data types and produce double precision floating points as the output data type. A numeric data type is any one of the following data types: SMALLINT, INTEGER, REAL, DOUBLE, FLOAT, DECIMAL, or NUMERIC.
The correlation coefficient r (also known as the Pearson product-moment correlation coefficient) is a measure of the linear relationship between two attributes or columns of data. The value of r can range from -1 to +1 and is independent of the units of measurement. A value of r near 0 indicates little correlation between attributes; a value near +1 or -1 indicates a high level of correlation.
When two attributes have a positive correlation coefficient, an increase in the value of one attribute indicates a likely increase in the value of the second attribute. A correlation coefficient of less than 0 indicates a negative correlation; that is, when one attribute shows an increase in value, the other attribute tends to show a decrease.
Consider two variables x and y:
If r = 1, then x and y are perfectly positively correlated. The possible values of x and y all lie on a straight line with a positive slope in the (x, y) plane.
If r = 0, then x and y are not correlated. They are not linearly associated with each other. This does not mean, however, that x and y are statistically independent.
If r = -1, then x and y are perfectly negatively correlated. The possible values of x and y all lie on a straight line with a negative slope in the (x, y) plane.
The correlation coefficient is independent of the units of measurement.
Covariance is a measure of the linear relationship between two attributes or columns of data. The value of the covariance can range from -infinity to +infinity. However, if the value of the covariance is too small or too large to be represented by a number, the value will be represented by NULL.
Unlike the correlation coefficient, the covariance is dependent on the units of measurement. For example, measuring values of two attributes in inches rather than feet.
Use the Data export with ODBC to file (VWPEXPT2) program to select data in a table that is contained in a database registered in ODBC, and write the data to a delimited file. To run this program on AIX or UNIX, use the ODBC version of the Data Warehouse Center agent.
The Data export with ODBC to file program uses the following step and source parameters:
The ODBC connect string that was defined for the source database. The format is DSN (database); UID (user ID); PWD (password).
Only the following ODBC data types are supported: CHAR, VARCHAR, LONG VARCHAR, DATE, TIME, TIMESTAMP, INTEGER, SMALLINT, TINYINT, DECIMAL, NUMERIC, REAL, FLOAT, DOUBLE.
BINARY and GRAPHIC data are not supported.
The output file that is selected for the program in the process. This file will contain the exported data.
The SELECT statement that is defined for the step in the Steps notebook. The SELECT statement is enclosed in quotation marks.
Optional: The column separation character.
The output file is generated on the agent site.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
X |
Log information for the export is stored in the trcppp.log file. The location of the log file is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the Data export
with ODBC to file program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
Export error. |
20 |
Internal error. |
28 |
VWS_LOGGING error. |
32 |
Trace file error. |
128 |
Missing DLL. |
Security for the Data Warehouse Center is separate from both database or operating system security. The Data Warehouse Center security system provides you with the ability to control user access to Data Warehouse Center objects and the actions that users can perform. For example, you can use the security system to restrict access to some types of warehouse source or target data, such as personnel information.
The elements of Data Warehouse Center security are privileges, warehouse groups, and warehouse users. Users belong to groups, which have access to objects and some combination of privileges.
A privilege is an authorization to perform a particular function. In the Data Warehouse Center, two privileges can be assigned to groups: administration privilege and operations privilege. Privileges are assigned (by a user with administration privilege) to groups. In order for warehouse users to have a privilege, they must belong to a warehouse group that has the privilege.
Privilege |
Action |
Administration |
Define and change warehouse users and warehouse groups, change Data Warehouse Center properties, import metadata, and define which groups have access to objects when they are created. |
Operations |
Open and use all the functions in the Work in Progress window. |
The main object in Data Warehouse Center security is the warehouse group. When you define a warehouse group, you specify the sources, targets, and processes that the group can access, the warehouse users who will belong to the group, and the privileges that the users in the group will have. Group membership controls warehouse user access to objects, so that by including a user in a particular group you can grant access to only the objects that are needed for the user to perform his tasks.
During installation, a default warehouse group is created, and the user ID for the installation is assigned to this group. The first person who logs on to the Data Warehouse Center becomes a member of the default warehouse group and has all privileges. All users, sources, targets, and processes automatically belong to the default warehouse group when they are defined. If no other warehouse groups are added, all new objects will belong to the default warehouse group. If you do not want to use the security features of the Data Warehouse Center, you do not need to change anything; just keep the default warehouse group assignment.
When warehouse users define a new object, such as a warehouse source, the groups to which they belong will have access to the new object. The list of groups to which they can assign access is limited to the groups to which they belong.
The list of tables or views that users can define to the source will be limited by their group membership as well, so that they will be able to choose from among only the tables and views to which they have access. Further, the set of actions available to the user through the Data Warehouse Center will be limited by the level of security that the user has. For example, a user will not be able to access the properties of an object if the user does not belong to a group that has access to the object.
When a user defines a new object to the Data Warehouse Center and does not have administration privilege, all of the groups to which the user belongs will have access to the new object by default. The Security page of the object notebook will not be available to the user.
If a user has access to a warehouse source, but does not have administration privilege, the Security Groups page of the Warehouse Source notebook is not available.
The following table shows the relationship between various Data Warehouse Center objects and the group membership that is required to define or edit the object. Users who belong to a group with access to a process can update the steps in the process as well as the tables in the process.
Data Warehouse Center object |
Who can define |
Who can edit |
Subject area |
Any user |
Any user |
Process |
Any user |
Only users who belong to a group that has access to the process |
Step in a process |
Only users who belong to a group that has access to the process |
Only users who belong to a group that has access to the process |
Table, view, or file in a process |
Only users who belong to a group that has access to the process |
Only users who belong to a group that has access to the process |
Warehouse source |
Any user |
Only users who belong to a group that has access to the warehouse source |
Warehouse target |
Any user |
Only users who belong to a group that has access to the warehouse target |
Tables, views, and files within a warehouse source |
Only users who belong to a group that has access to the warehouse source |
Only users who belong to a group that has access to the warehouse source |
Tables and files within a warehouse target |
Only users who belong to a group that has access to the warehouse target |
Only users who belong to a group that has access to the warehouse target |
Schemas |
Any user |
Any user |
Tables in a schema |
- |
Only users who belong to a group that has access to the warehouse source or target that contains the tables |
Warehouse agents |
Any user |
Any user |
Programs group |
Any user |
Any user |
Program definition within a program group |
Any user |
Any user |
Warehouse groups |
Only users who belong to a group that has administration privilege |
Only users who belong to a group that has administration privilege |
Warehouse users |
Only users who belong to a group that has administration privilege |
Only users who belong to a group that has administration privilege |
Data Warehouse Center properties notebook |
- |
Only users who belong to a group that has administration privilege |
When you define a warehouse user to the Data Warehouse Center, you can assign a Data Warehouse Center logon user ID and password to the user as well as the groups to which the user will belong. All warehouse users automatically belong to the default warehouse group. To define a warehouse user, you must belong to a group that has administration privilege.
A user can be assigned to any number of warehouse groups. All warehouse users can view all objects in the Data Warehouse Center tree and contents pane. All warehouse users can always use the Locate, Show Related, and Refresh functions for any object. All warehouse users can update their own user ID and password by opening their User Properties notebook. You can specify that a user will receive e-mail notification when a step completes.
The Data Warehouse Center uses the following default values for each data type that it supports.
The following table shows the column types and defaults - ODBC types for DB2
Data type |
Size |
Scale |
Is null |
Is text |
CHAR |
user defined (10) |
none |
true |
true |
VARCHAR |
user defined (10) |
none |
true |
true |
Decimal |
user defined (5,2) |
yes |
true |
false |
Numeric |
user defined (5,2) |
yes |
true |
false |
Small int |
fixed (2) |
none |
true |
false |
Integer |
fixed (4) |
none |
true |
false |
Real |
fixed |
none |
true |
false |
Date |
fixed (6) |
none |
true |
false |
Time |
fixed (6) |
none |
true |
false |
Timestamp |
fixed (16) |
none |
true |
false |
Double |
fixed (8) |
none |
true |
false |
Float |
fixed (8) |
none |
true |
false |
Long VARCHAR DB2 UDB for AS/400 |
fixed (32740) |
none |
true |
false |
Long VARCHAR DB2 Common Server |
fixed (32700) |
none |
true |
true |
CLOB |
user defined (0) |
none |
true |
true |
The following table lists DBCS types. The DBCS fields are half of the size of comparable SBCS fields.
Data type |
Size |
Scale |
Is null |
Is text |
DBCLOB |
user defined (0) |
none |
true |
true |
GRAPHIC |
user defined (0) |
none |
true |
true |
VARGRAPHIC |
user defined (0) |
none |
true |
true |
Long VARGRAPHIC DB2 Common Server |
fixed (16350) |
none |
true |
true |
Long VARGRAPHIC DB2 UDB for AS/400 |
fixed (16370) |
If you want the Data Warehouse Center to automatically add the DISTINCT keyword to the SQL statement or generate a GROUP BY clause, the column length must be less than or equal to 254.
The Data Warehouse Center adds the VWEDITION column if the step has one or more editions. VWEDITION is a fixed integer column with a length of 4. The Data Warehouse Center removes the VWEDITION column if the number of editions is changed to zero. You can change the number of editions in development mode only.
Some target databases might not support all data types. See the documentation for your database type.
Use the DB2 UDB for AS/400 Load Insert (VWPLOADI) program to load data from a flat file into a DB2 UDB for AS/400 table, appending to existing data.
Before the program loads new data into the table, it exports the table to a backup file, which you can use for recovery.
The DB2 UDB for AS/400 Load Insert program uses the following step and warehouse parameters:
Source file name
Target table
Backup file
FileMod string
Restrictions
The Data Warehouse Center definition for the agent site that is running the program must include a user ID and password. The database server does not need to be on the agent site. However, the source file must be on the database server. Specify the fully qualified name of the source files as defined on the DB2 server.
If the program detects a failure during processing, the table will be emptied. If the load generates warnings, the program returns as successfully completed.
You can find log information in /QIBM/USERDATA/VWxxxxxx.VWPLOADI.
The location of the log files is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the DB2 UDB for
AS/400 Load Insert program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
Load error. |
20 |
Export error. |
24 |
Internal error. |
28 |
VWS_LOGGING error. |
32 |
Trace file error. |
128 |
Missing DLL.
A Windows NT error that that can occur when the program was compiled
on another operating system. |
Use the DB2 UDB for AS/400 Load Replace (VWPLOADR) program to load data from a flat file into a DB2 UDB database, replacing existing data.
The DB2 UDB for AS/400 Load Replace program uses the following step and warehouse parameters:
Source file name
Target table
FileMod string
Restrictions
The Data Warehouse Center definition for the agent site that is running the program must include a user ID and password. The database server does not need to be on the agent site. However, the source file must be on the database server. Specify the fully qualified name of the source files as defined on the DB2 server.
If the program detects a failure during processing, the table is emptied. If the load generates warnings, the program returns as successfully completed.
You can find log information in /QIBM/USERDATA/VWxxxxxx.VWPLOADR.
The location of the log files is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the DB2 UDB for AS/400 Load Replace program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
Load error. |
20 |
Internal error. |
28 |
VWS_LOGGING error. |
32 |
Trace file error. |
128 |
Missing DLL. |
Use the Visual Warehouse 5.2 DB2 UDB Load Insert (VWPLOADI) program to run your Visual Warehouse 5.2 VWPLOADI program in the Data Warehouse Center.
Before the program loads new data into the table, it exports the table to a backup file, which you can use for recovery.
The Visual Warehouse 5.2 DB2 UDB LOADI version of this program uses the following step and warehouse parameters:
The flat file selected as a source for the step. The step must have only one selected source file. The source file must contain the same number and order of fields as the target tables. Only delimited ASCII (ASCII DEL) source files are supported. For information about the format of delimited files, see the DB2 Command Reference.
The warehouse target database selected when the step was created. You must have either SYSADM or DBADM authorization to the DB2 database. The DB2 UDB Load Insert program does not support multinode databases. For multinode databases, use Load flat file into DB2 UDB EEE (VWPLDPR) for DB2 UDB Extended Enterprise Edition.
The user ID and password that are defined for the warehouse database in the Warehouse notebook.
The target table that is selected for the step.
The backup directory and file name.
The FileMod string (optional). See the DB2 UDB documentation for the LOAD command for the file mode options that can be used with the ASCII DEL format.
Recommendation
Create the target table in its own private DB2 tablespace. Any private tablespace that you create will be used by default for all new tables that do not specify a tablespace. If processing fails, DB2 might put the whole tablespace in hold status, making the tablespace inaccessible. To avoid this hold problem, create a second private tablespace for steps that do not use the load programs.
To create a tablespace:
CREATE TABLESPACE tablespace-name MANAGED BY SYSTEM USING ('d:/directory')
where directory is the directory that is to contain the databases. DB2 creates the directory for you.
After you create the tablespaces, open the Steps notebook for each step, and specify which tablespace the step uses.
Restrictions
The Data Warehouse Center definition for the agent site that is running the program must include a user ID and password. The DB2 load utility cannot be run by a user named SYSTEM. Be sure to select the same agent site in the warehouse source and the warehouse target for the step using the Data Warehouse Center program. The database server does not need to be on the agent site. However, the source file must be on the database server. Specify the fully qualified name of the source files as defined on the DB2 server.
If the program detects a failure during processing, the table will be emptied. If the load generates warnings, the program returns as successfully completed.
The program does not collect database statistics. Run the DB2 UDB Runstats program after a sizable load is complete.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
X |
X |
You can find log information in the following files:
trcppp.log contains log information for the DB2 Load utility.
sqlppp.log contains log information for the DB2 Load utility.
sqlppp.lbk contains log information for the backup of the original database.
sqlppp.lg2 contains recovery information if an error occurs during load processing.
The location of the log files is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the DB2 UDB
load insert program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
Load error. |
20 |
Export error. |
24 |
Internal error. |
28 |
VWS_LOGGING error. |
32 |
Trace file error. |
128 |
Missing DLL.
A Windows NT error that can occur when the program was compiled on
another operating system. |
Use the Visual Warehouse 5.2 DB2 UDB Load Replace (VWPLOADR) program to run your Visual Warehouse 5.2 VWPLOADR program in the Data Warehouse Center.
This Visual Warehouse 5.2 version of the VWPLOADR program uses the following step and warehouse parameters:
The file that is selected as a source for the step. The step must have only one source file selected. The source file must contain the same number and order of fields as the target tables. Only delimited ASCII ( ASCII DEL) source files are supported. For information about the format of delimited files, see the DB2 Command Reference.
The warehouse target database that was selected when the step was created.
You must have either SYSADM or DBADM authorization to the DB2 database. This program does not support multinode databases. For multinode databases, use Load flat file into DB2 UDB EEE (VWPLDPR) for DB2 UDB Extended Enterprise Edition.
The user ID and password that was defined for the warehouse database in the Warehouse notebook.
The target table that is selected for the step. The step using the program must specify that the maximum number of editions for the target table is 0.
The FileMod string (optional). See the documentation for the LOAD command for the file mode options that can be used with the ASCII DEL format.
Recommendation
Create the target table in its own private DB2 tablespace. Any private tablespace that you create will be used for all new tables that do not specify a tablespace. If processing fails, DB2 might put the whole tablespace in hold status, making the tablespace inaccessible. To avoid this hold problem, create a second private tablespace for steps that do not use the load programs.
To create a tablespace:
CREATE TABLESPACE tablespace-name MANAGED BY SYSTEM USING ('d:/directory')
where directory is the directory that is to contain the databases. DB2 creates the directory.
After you create the tablespace, open the Steps notebook for each step, and specify which tablespace the step uses.
Restrictions
The Data Warehouse Center definition for the agent site that is running the program must include a user ID and password. The DB2 load utility cannot be run by a user named SYSTEM. Be sure to select the same agent site in the warehouse source and the warehouse target for the step that uses the Data Warehouse Center program. The database server does not need to be on the agent site. However, the source file must be on the database server. Specify the fully qualified name of the source files as defined on the DB2 server.
If the program detects a failure during processing, the table is emptied. If the load generates warnings, the program returns as successfully completed.
The DB2 UDB Load Replace program collects database statistics during the load, so you do not need to run the DB2 UDB Runstats (VWPSTATS) program after this program.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
X |
X |
Log information is located in the following files:
trcppp.log contains log information for the DB2 Load facility.
sqlppp.log contains log information for the DB2 Load facility.
sqlppp.lg2 contains recovery information if an error occurs during load processing.
The location of the log files is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the DB2 UDB
Load Replace program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
Load error. |
20 |
Internal error. |
28 |
VWS_LOGGING error. |
32 |
Trace file error. |
128 |
Missing DLL. |
Metadata that describes the format of objects in the warehouse, the sources of data, and the transformations that are applied to the data. Examples of definition metadata are column names, table names, and database names.
Some steps you export might contain BLOB (binary large objects) data. The following steps can contain BLOB data:
DB2 UDB Export
DB2 UDB Load
DB2 UDB Reorg
DB2 UDB Runstats
DB2 UDB for OS/390 Load
DB2 UDB for OS/390 Runstats
DB2 for AS/400 Data Load Insert (VWPLOADI)
DB2 for AS/400 Data Load Replace (VWPLOADR)
Data export with ODBC to file (VWPEXPT2)
SQL steps
When you export metadata definitions for BLOB data, multiple tag language files are created. The file name that is generated for each supplementary file has the same name as the tag language file with a numeric extension. For example, if the tag language file you specified is e:\tag\steps.tag, the supplementary tag language files are named e:\tag\steps.1, e:\tag\steps.2, and so on. Only the file extension is used to identify the supplementary files within the base tag language file, so you can move the files to another directory. However, you should not rename the files. You must always keep the files in the same directory, otherwise you will not be able to import the files successfully.
Use the Generate Key Table transformer to add a unique key to a warehouse table.
The key column must have a data type of Integer. No other data types are valid for the key column, including SMALLINT.
You can change values in the key column in two ways:
Update the values in an existing key column to add values based on the key values in another column. If you want to do this, you must define the key column to which the transformer will write to allow null values. This key column does not need to be a primary key column. This choice will add missing values in the key column starting at the highest value in the key start value column plus the increment value that you specify.
Replace or create key values in a specific column. If values already exist in the key column, they will be overwritten
If you are using a partitioned database, such as the type supported by DB2 UDB Extended Enterprise Edition, the partitioning key column value cannot be updated. If you attempt to do this, you will receive an SQL0270N message with an RC2 in the transformer log table.
You can use the DB2 CREATE TABLE statement to define a partitioning key on a table. If a partitioning key is not defined for a table in a table space that is divided across more than one database partition in a nodegroup, a partitioning key is created by default from the first column of the primary key. If no primary key is specified, the default partitioning key is the first non-long field column that is defined on the table.
Use the Generate Period Table transformer to create a new period table. A period table contains a column that consists of a generated date, time, or timestamp values. In addition to this column, the period table can contain one or more optional output columns based on the date/time value for the row or on parameters that you specify.
You can use a period table in a number of ways. For example:
You can use a period table to analyze trends by creating one or more output columns based on the generated date/time value for each row.
You can use a period table in an SQL join to other data tables such that the resulting product is used for simpler SQL SELECT statements that are based on user requirements.
You can use a period table as a starting point for creating a more complex data table after other data columns are added to the generated period table.
When you import data from an Informix 7 and 9 data source into a new warehouse source, the Data Warehouse Center converts the Informix data types to ODBC data types, as described in the following table:
Informix 7 and 9 data type |
ODBC data type |
CHAR |
CHAR |
VARCHAR |
VARCHAR |
FLOAT |
FLOAT |
FLOAT |
DOUBLE |
SMALLFLOAT |
REAL |
DATE |
DATE |
INTEGER |
INTEGER |
SMALLINT |
SMALLINT |
SERIAL |
INTEGER |
DECIMAL |
DECIMAL |
MONEY |
DECIMAL |
TEXT |
LONG VARCHAR |
TIMESTAMP |
TIMESTAMP |
DATETIME YEAR TO FRACTION |
N/A |
After you have selected a source for your step and opened the Column Mapping page of the step notebook for the first time, you will see the ODBC data types used in the Column Grid. See the DB2 data types table to see characteristics of the ODBC data types that will be used for your business view table.
A metadata interchange format that is supported by the Data Warehouse Center. A tag language file is an example of an interchange file.
Use the Invert Data transformer to invert the rows and columns in a source column, making rows become columns and columns become rows in the target table. The source table and the target table must be in the same warehouse target database.
The order of data among the columns, from top to bottom, is maintained and placed in rows left to right. This transformer rotates the table on an axis cutting a diagonal from the upper left to the lower right. To visualize this process, hold a simple table of data printed on a square piece of paper in your hands at the upper left and lower right. Then flip it over and hold it up to a light. From this exercise, you can see how the rows and columns maintain orientation with respect to one another.
Optionally, you can specify a pivot column in the source table. The data in this column will become the column names in the new step and will not appear as data in the output table.
For example, consider the following example source table and the data in its rows and columns. The data type for each column is in parentheses:
Campbell (DOUBLE) |
Cupertino (DOUBLE) |
Los Gatos (DOUBLE) |
Mill Valley (DOUBLE) |
296,539 |
345,908 |
357,498 |
475,676 |
263,463 |
318,495 |
335,843 |
471,476 |
309,657 |
339,572 |
350,467 |
428,451 |
If you use the Invert Data transformer on the example source table and do not specify a pivot column, the transformer will copy the rows and columns from the table, invert them, and write them to a target table. The rows and columns appear in the example target table as follows:
Pivot column (CHAR) |
(DOUBLE) |
(DOUBLE) |
(DOUBLE) |
Campbell |
296,539 |
263,463 |
309,657 |
Cupertino |
345,908 |
318,495 |
339,572 |
Los Gatos |
357,498 |
335,843 |
350,467 |
Mill Valley |
475,676 |
471,476 |
428,451 |
The source table should either be of the same data type or of data types that are related to each other through automatic promotion (except the pivot column that contains the column names that you plan to use in the target table, if one exists).
The data in the pivot column must be one of the following types:
CHAR
DATE
TIME
TIMESTAMP
SMALLINT
INTEGER
NUMBERIC
DECIMAL
All data in the pivot column must be less than 18 characters. Alphabetic characters will be converted to uppercase on DB2 systems that require object names in uppercase.
The number of rows in the source table should be equal to or less than the maximum number of table columns supported in the version of DB2 that is running this transformer. An error will occur if this number is exceeded.
The Invert Data transformer drops the existing database table and recreates it during each run. Each time you run a step using this transformer, the existing data is replaced, but the table space and table index names are preserved.
The primary foreign key relationships can document the relationships used to build the fact and dimension tables in the warehouse. It is important to document these relationships, to aid in creating the warehouse. Generally the dimension tables should be updated first, allowing for the dimension information to be in the warehouse when the fact tables are updated. The key in the dimension table should be created in the warehouse to insure that the dimension rows are unique. For example the geographies, or regions that a company uses, the time periods that a company uses or the stores or products or customers that a company involves. When records are added to the fact table, the dimension tables are used to insure that the fact row is clean and consistent with the right information. However, you may also want to document additional relationships that exist between tables, and these relationship are not part of a primary/foreign key relationship. These relationships are specific to a particular warehouse schema and do not affect the tables in the schema in any way.
Once the fact and dimension tables are built in the warehouse, they are used to build OLAP cubes for multidimensional analysis. The foreign key relationships are then used to load the values into the cube.
The table definitions and all schema relationships are used to build the OLAP model and can be exported to the OLAP server for the purpose of building the cube.
The warehouse fact and dimension tables can also be used to create reports, summaries and other tables that end users may need for business analysis.
K,
Use the Visual Warehouse 5.2 Load flat file into DB2 UDB EEE (AIX only) (VWPLDPR) program to load data from a delimited flat file into a DB2 UDB Extended Enterprise Edition database, replacing existing data.
Before you use this program, you must be familiar with parallel system concepts and parallel load.
The Load flat file into DB2 UDB EEE program performs the following steps in loading data to a parallel database:
Connects to the target database.
Acquires the target partitioning map for the database.
Splits the input file so that each file can be loaded on a node.
Runs a remote load on all nodes.
If the load step fails on any node, the Load flat file into DB2 UDB EEE program performs the following steps:
Builds an empty load data file for each node.
Loads the empty data files.
The Load flat file into DB2 UDB EEE program uses the following step and warehouse source parameters:
The flat file selected as a source for the step. The step must have only one source file selected. Only delimited (DEL) files are supported. The input file and the split files must be on a file system that is shared by all nodes involved in the database load. The shared file system must be mounted on the same directory on all nodes. The directory must be large enough to contain the input file before and after the file is split.
The warehouse database that was selected when the step was created.
The user ID and password that were defined for the warehouse database in the Warehouse notebook.
The target table that is selected for the step.
The column delimiter.
The string delimiter.
The decimal delimiter.
The non-NFS directory.
The output file name.
The partitioned key. If the partitioned key contains UNIX sensitive characters such as '(' and ')', enclose the key in a pair of quotation marks.
The Load flat file into DB2 UDB EEE program does not run the DB2 RUNSTATS utility after the load. If you want to automatically run the RUNSTATS utility after the load, add a step to your process that runs RUNSTATS .
Recommendation
Create the target table in its own private DB2 tablespace. Any private tablespace that you create will be used for all new tables that do not specify a tablespace. If processing fails, DB2 might put the whole tablespace in hold status, making the tablespace inaccessible. To avoid this hold problem, create a second private tablespace for steps that do not use the load programs.
To create a tablespace:
CREATE TABLESPACE tablespace-name MANAGED BY SYSTEM USING ('d:/directory')
where directory is the directory that is to contain the databases. DB2 creates the directory.
After you create the tablespaces, open the Steps notebook for each step, and specify which tablespace the step uses.
|
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
The trcppp.log file contains a list of all the temporary files created by the program. The location of this file is based on the value of the VWS_LOGGING environment variable. To determine the source of a problem, look at this file first.
The following table lists the error codes returned by the Load flat
file into DB2 UDB EEE program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
9 |
Connect error. |
10 |
Missing system programs. |
11 |
Missing SQL utilities |
12 |
Missing db2nodes.cfg file. |
20 |
DB2DGPMAP utility error. |
30 |
DB2SPLIT utility error. |
31 |
Missing output from DB2SPLIT. |
40 |
Load utility error. |
When you import data from a Microsoft SQL Server 6 and 7 data source into a new warehouse source, the Data Warehouse Center converts the Microsoft data types to ODBC data types, as described in the following table:
Microsoft SQL Server 6 and 7 data type |
ODBC data type |
CHAR |
CHAR |
VARCHAR |
VARCHAR |
DECIMAL |
DECIMAL |
FLOAT |
FLOAT |
DOUBLE |
DOUBLE |
NUMERIC |
NUMERIC |
DATETIME |
TIMESTAMP |
SMALLDATETIME |
TIMESTAMP |
INT |
INTEGER |
TINYINT |
TINYINT |
SMALLINT |
SMALLINT |
MONEY |
DECIMAL |
SMALLMONEY |
DECIMAL |
SYSNAME |
VARCHAR |
TEXT |
LONG VARCHAR |
BINARY |
BINARY |
BIT |
BIT |
IMAGE |
LONG VARCHAR |
REAL |
REAL |
TIMESTAMP |
VARCHAR |
VARBINARY |
VARCHAR (With Is Text field not checked) |
Microsoft SQL Server 7 data type |
ODBC data type |
NCHAR |
CHAR for bit data |
NVARCHAR |
VARCHAR for bit data |
NTEXT |
LONG VARCHAR for bit data |
After you have selected a source for your step and opened the Column Mapping page of the step notebook for the first time, you will see the ODBC data types used in the Column Grid. See the DB2 data types table to see characteristics of the ODBC data types that will be used for your business view table.
The Moving average transformer predicts the future course of time series values.
Moving averages, both simple and exponentially smoothed, often have the ability to predict the future course of a time-related series of values. Moving averages are widely used in time-series analysis in business and financial forecasting. Rolling sums have other widely used financial uses.
Moving averages redistribute events that occur briefly over a wider period of time. This redistribution serves to remove noise, random occurrences, and large peaks or valleys from time-series data. The moving average method can be applied to a time-series data set to remove the effects of seasonal variations, extract the data trend, enhance the long-term cycles, and smooth a data set before performing higher level analysis.
With this transformer, you can calculate:
A simple moving average. A simple moving average (SMA) is defined as:
SMA [row i] = SUM (data values for last N rows) / N
An exponential moving average. An exponential moving average (EMA) is defined as:
EMA [row i] = ( Value [row i] * K ) + ( EMA [row i-1] * ( 1-K ) )
where: K = 2/(N+1)
For example, for a four-day EMA, N = 4, K = 2/5 and 1-K = 3/5
An exponential moving average is also known as an exponentially smoothed moving average.
A rolling sum for N periods of data, where N is specified by the user. A rolling sum (RS) is defined as:
RS [row i] = SUM ( data values for the last N rows)
The Data Warehouse Center automated ODBC connect string follows the syntax shown in the following examples. The automated ODBC connect string works for Intersolv level 9 drivers.
If you do not have connectivity for the database types that the Data Warehouse Center supports, you can code your own ODBC connect string. The ODBC connect string syntax for Oracle, Informix, Sybase, Microsoft SQL Server and generic ODBC is shown in the following examples:
Oracle 7 and 8 connect string syntax
DSN=dsn;UID=uid;PWD=pwd;SRVR=srvr:computername:sid;
Where:
dsn is the data source name.
uid is the user ID.
pwd is the password.
srvr is the network. A network is a letter that identifies the network protocol being used. The network can be either P (named pipes), X (SPX), B (NETBIOS), T (TCP/IP),
D (DECNet), A (Oracle Async), or AT (AppleTalk).
computername is the name of the Oracle listener on the network.
sid is the Oracle system identifier and refers to the instance of Oracle that is running on the host.
Sybase 10/11 connect string syntax
The following is the path name to the interfaces file:
DSN=dsn;UID=uid;PWD=pwd;SRVR=srvr;DB=db;IFILE=interfacefile;
Where:
dsn is the data source name.
uid is the user ID.
pwd is the password.
srvr is the name of the server that contains the tables that you want to access. If you do not specify a server name, the default is the server name in the DSQUERY environment variable.
db is the name of the database that you want to connect to. The default is the database that is defined by the Sybase system administrator for each user.
Informix 7, 9 connect string syntax
DSN=dsn;UID=uid;PWD=pwd;DB=db;HOST=host;SERV=serv;PRO=pro;
Where:
dsn is the data source name.
uid is the user ID.
pwd is the password.
db is the name of the database that you want to connect to.
host is the name of the workstation where your Informix 5 server resides.
serv is the name of the service as it appears on the host workstation.
pro is the protocol that is used to communicate with the server. Some of the possible values are FTP, IPX, LANMAN, and TCP/IP.
Microsoft SQL Server 6 and 7 connect string syntax
DSN=dsn;UID=uid;PWD=pwd;SRVR=srvr;DB=db;WKID=wkid;
Where:
dsn is the data source name.
uid is the user ID.
pwd is the password.
srvr is the name of the server that contains the SQL Server tables that you want to access.
db is the name of the database that you want to connect to.
wkid is the workstation ID that is used by your agent site.
Generic ODBC data source connect string syntax
If you have a generic ODBC source that you want to use as an information resource other than those listed in the Resource Type window (for example, Microsoft Access), you can use the connect string shown here.
DSN=dsn;UID=uid;PWD=pwd;
Where:
dsn is the data source name.
uid is the user ID.
pwd is the password.
Use the OLAP Server: Calc with calc rules (ESSCALC2) program to apply the specified Calc script to an OLAP Server database. When you define a step that uses this program, do not specify a source or a target table for the step. The OLAP Server can be a DB2 OLAP Server or Essbase server.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
Log information for the OLAP Server: Calc with calc rules program is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the OLAP
Server: Calc with calc rules program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
OLAP Server error. |
20 |
Calc error. |
24 |
Internal error. The program detected a trace error, such as the inability to create or open the trace file. |
Use the OLAP Server: Default calc (ESSCALC1) program to call the default OLAP Server calc script associated with the target database. The OLAP Server can be a DB2 OLAP Server or Essbase server.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
The file trcppp.log contains log information for the OLAP Server: Default calc program. The location of this file is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the OLAP
Server: Default calc program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
OLAP Server error. |
20 |
Calc error. |
32 |
Open Trace error The program detected a trace error, such as the inability to create or open the trace file. |
Use the OLAP Server: Free text data load (ESSDATA1) program to load data from a comma-delimited flat file into a multidimensional OLAP Server database using free-form data loading. The OLAP Server can be a DB2 OLAP Server or Essbase server.
The OLAP Server: Free text data load program uses the file that is selected as a source for the step in the process. The step must have only one source file selected. The data in the source file must match the OLAP Server outline values exactly. The selected source file must be on the agent site.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
Log information for this program is stored in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the OLAP
Server: Free text data load program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
OLAP Server error. |
20 |
Load error. |
32 |
Open Trace error
The program detected a trace error, such as the inability to create
or open the trace file. |
Use the OLAP Server: Load data from file with load rules (ESSDATA2) program to load data from a source flat file into a multidimensional OLAP Server database using load rules. The OLAP Server can be a DB2 OLAP Server or Essbase server.
The program uses the selected file as a source for the step. The step must have only one source file selected. The source file name must follow the conventions for specifying file names on the client or server of your OLAP Server system.
If the file location flag in your program indicates that the file is on an agent site, you must specify a fully qualified file name in the Load rule file name field on the Parameters page of the Step notebook.
If the file location flag indicates that the source file is on the OLAP server, you must specify the file name without an extension in the Load rule file name field on the Parameters page of the Step notebook.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
Log information for the OLAP Server: Load data from file with load rules program is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the OLAP
Server: Load data from file with load rules program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
OLAP Server error. |
20 |
Load error. |
32 |
Open Trace error
The program detected a trace error, such as the inability to create
or open the trace file. |
Use the OLAP Server: Load data from an SQL table with load rules (ESSDATA3) program to load data from an SQL source into a multidimensional OLAP Server database using load rules. The OLAP Server can be a DB2 OLAP Server or Essbase server.
The OLAP Server: Load data from an SQL table with load rules program uses the user ID and password that are defined for the Warehouse source.
Do not select a target table for the step.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
Log information for the OLAP Server: Load data from an SQL table with load rules program is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the OLAP
Server: Load data from an SQL table with load rules program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
OLAP Server error. |
20 |
Load error. |
32 |
Open Trace error
The program detected a trace error, such as the inability to create
or open the trace file. |
Use the OLAP Server: Load data from a file without using load rules (ESSDATA4) program to load data from a flat file into a multidimensional OLAP Server database without using load rules. The OLAP Server can be a DB2 OLAP Server or Essbase server.
When you define a step that uses this program, select a source file, but do not specify a target table. The source file name must follow the conventions for specifying file names on the client or server of your OLAP Server system.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
Log information for the OLAP Server: Load data from file with load rules program is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the OLAP
Server: Load data from file without using load rules program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
OLAP Server error. |
20 |
Load error. |
32 |
Internal error.
The program detected a trace error, such as the inability to create
or open the trace file. |
Use the OLAP Server: Update outline (ESSOTL1) program to update an OLAP Server outline from a source flat file using load rules. The OLAP Server can be a DB2 OLAP Server or Essbase server.
The OLAP Server: Update outline program uses the selected file as a source for the step. The step must have only one source file selected. The source file name must follow the conventions for your OLAP Server for specifying file names on the client or server.
If the file location flag in your program indicates that the file is on an agent site, you must specify a fully qualified file name in the Load rules file name field on the Parameters page of the Step notebook.
If the file location flag indicates that the source file is on the OLAP Server, you must specify the file name without an extension in the Load rules file name field on the Parameters page of the Step notebook.
For more information, see the documentation for your OLAP Server.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
Log information for the OLAP Server: Update outline program is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the OLAP
Server: Update outline program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
OLAP Server error. |
20 |
Load error. |
32 |
Open Trace error The program detected a trace error, such as the inability to create or open the trace file. |
Use the OLAP Server: Update outline from SQL table (ESSOTL2) program to update an ODBC Server outline from an SQL source using load rules. The OLAP Server can be a DB2 OLAP Server or Essbase server.
The OLAP Server: Update outline from SQL table program uses the user ID and password defined for the Warehouse source database that is connected to the step that uses the program.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
Log information for the OLAP Server: Update outline from SQL table program is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the OLAP
Server: Update outline from SQL table program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
Essbase error. |
20 |
Load error. |
32 |
Open Trace error
The program detected a trace error, such as the inability to create
or open the trace file. |
A one-way ANOVA uses one grouping column. The data assumptions for the one-way ANOVA are:
There is one data column and one grouping column.
The data column is composed of data values (x1, x2, x3..., xn).
The data column can contain equal sample sizes in each group. Unequal sample sizes are allowed for the one-way ANOVA.
The following table is a sample of a one-way ANOVA summary table:
SOURCE |
DF |
SS |
MS |
FVALUE |
PVALUE |
Between |
DFB |
SSB |
MSB |
Fvalue |
Pvalue |
Within |
DFE |
SSE |
MSE |
Null |
Null |
Total |
DFT |
SST |
Null |
Null |
Null |
Assume the values for the following variables:
n = The total number of data values in the data column.
k = The number of unique groups in the single grouping column.
x(i) = The ith data value in the data column.
count(i) = Count of the data values in the ith group
avgGrp(i) = The average of all the values in the ith group
avgTotal = The average of all the data values in the data column.
The equations for the calculated values are:
DFB = k - 1
DFE = n - k
DFT = n - 1
SSB = SUM(i=1 to k) [ count(i) * (avgGrp(i) - avgTotal)**2 ]
SSE = SST - SSB
SST = SUM(i=1 to n) [ (x(i) - avgTotal)**2 ]
MSB = SSB / DFB
MSE = SSE / DFE
FValue = MSB / MSE
PValue = PValueFunction( DFB, DFE, FValue)
When you import data from an Oracle Version 7 and 8 data source into a new warehouse source, the Data Warehouse Center converts the Oracle data types to ODBC data types, as described in the following table:
Oracle 7 and 8 data type |
ODBC data type |
CHAR |
CHAR |
VARCHAR2 |
VARCHAR |
NUMBER |
DECIMAL |
DATE |
TIMESTAMP |
LONG |
LONG VARCHAR |
LONG RAW |
LONG VARCHAR |
RAW |
VARCHAR |
NUMBER |
DOUBLE |
RAW and LONG data types cannot be over 2,147,478,528 bytes.
After you have selected a source for your step and opened the Column Mapping page of the step notebook for the first time, you will see the ODBC data types used in the Column Grid. See the DB2 data types table to see characteristics of the ODBC data types that will be used for your business view table.
P-value is the probability, when the null hypothesis is true, that the absolute value of the T-statistic would equal or exceed the observed value (T-value). A small P-value is evidence that the null hypothesis is false and that the attributes are in fact correlated.
The toolbar at the left side of the Process Model window. Click the icons in the toolbar to add sources, targets, and steps to the process, and to link objects in the process.
The Data Warehouse Center uses several different types of passwords. These passwords are defined in different ways. For example, you define a password that the Data Warehouse Center needs to access upon startup before you initialize or reinitialize the Data Warehouse Center server. You also define passwords that the Data Warehouse Center needs to access after startup within the Data Warehouse Center environment.
These passwords can be grouped into the following categories:
Passwords that are used by Data Warehouse Center users to log on to the Data Warehouse Center
Passwords that are used by an agent to run programs
Passwords that are used to connect to a database to access data
The password that the Data Warehouse Center server uses to connect to the warehouse control database
Different actions are required to change each of these passwords.
Data Warehouse Center users use a password to log on to the Data Warehouse Center. A warehouse user with Administration privilege assigns passwords to users as part of the task of defining users to the Data Warehouse Center.
Current Data Warehouse Center users who do not have administration privilege can change their passwords by opening their own Warehouse User properties notebook and making the appropriate changes.
A current warehouse user who has administration privilege can change the password for another Data Warehouse Center user. To change the password for another Data Warehouse Center user, open the Users notebook and make the appropriate changes.
While users are not required to change their passwords once they have been assigned, they may want to change their Data Warehouse Center password if their Windows NT password changes in order to keep the passwords coordinated.
A Data Warehouse Center agent site can run as a system process or a user process. An agent site that runs steps as user processes must be defined with a user ID and a password that is defined on the computer on which it is installed. To define a user ID and password for an agent site to the Data Warehouse Center, open the Define Agent Site notebook and enter the information into the fields in the Operating system logon area.
If you want to change a password for an existing agent site, ensure that you change the password on the computer on which the agent site resides first. Then right-click the agent site in the Data Warehouse Center contents pane and click Change User ID and Password (or open the properties notebook for the agent site).
On Windows NT workstations, the agent can be run as either a system or a user process, depending upon how the Data Warehouse Center agent daemon service is set up. The default agent runs as part of the Data Warehouse Center server, so by default the agent's characteristics depend upon how the Data Warehouse Center Server service is set up.
The step inherits this characteristic from the agent. If the step will need something that a user would have access to (such as a LAN drive), the step must run as a user process.
You can change how the Data Warehouse Center server runs by opening the Services Control Panel in Windows NT, selecting the Data Warehouse Center server, and clicking Startup. The Service Startup window opens. Under Log On As, select System Account to specify that the service will log on to the system account. To specify that the service will log on to a user account, select This Account and type the user account. When you enter a user ID and password in the properties notebook for the default agent, the agent site will run a user process.
For AIX and Sun, a user ID and password are always required for the agent, and the agent runs on behalf of (and inherits the characteristics of) that user.
One common problem is caused by running Data Warehouse Center agents as a system process rather than a user process. When the agent runs as a system process, it is not authorized to connect to network drives or products because the process does not have a user ID. Symptoms of this problem include the agent being unable to find the Data Warehouse Center program (Error RC2 = 128 or Error RC2 = 1 in the log viewer Details window) or being unable to initialize the program.
When you define a warehouse target that represents a relational database, you must specify the database administrator's user ID and password. Access to warehouse sources is read-only, so the user ID and password that you use do not need to have database administrator authority. You define user IDs and passwords for warehouse sources and targets on the Database page of the Warehouse Source and Warehouse Target notebooks.
To change a password for a warehouse source or target, you must belong to a group that has access to the source or target. Ensure that you change the password on the relational database first, then right-click the warehouse source or warehouse target in the Data Warehouse Center contents pane and click Change User ID and Password. If you want to change user IDs and passwords for several warehouse objects at once, you can select multiple objects in the contents pane and then click Change User ID and Password. Make the appropriate changes in the Change User ID and Password window. If you change the user ID of a database administrator, ensure that the new user ID has the same level of authority as the previous user ID.
Because database administrator user IDs and passwords can change often, you might need to update these more often than other Data Warehouse Center user IDs or passwords.
Before you define this password to a user ID in the Data Warehouse Center, you must define it to the same user ID in Windows NT. This user ID must have Windows NT administrator authority.
You define this password in the Data Warehouse Center when you install the warehouse server. The installation program saves the Data Warehouse Center control database administrator user ID and password in the Windows NT registry and to the warehouse control database as the default user ID and password.
While the Data Warehouse Center will save the user ID and password to the Windows NT registry each time it is installed, it will only save the user ID and password as the default user ID and password only the first time it is installed.
When the user ID and password are saved, they are managed separately. You can change the default user ID and password at any time, as you would a user ID and password for any other Data Warehouse Center user.
If you want to change the user ID or password for the warehouse control database administrator, you must first change the corresponding user ID or password on Windows NT. Then, open the Warehouse Control Database Management window.
If the Data Warehouse Center administrative client is installed, click Start --> Programs --> DB2 for Windows NT --> Data Warehouse Center --> Control Database Management.
If the Data Warehouse Center administrative client is not installed, click Start --> Programs --> DB2 for Windows NT --> Warehouse Control Database Management.
The Data Warehouse Center does not require you to change this password after it is created. You might need to change it, for example, if your security processes require a change.
Use the Pivot Data transformer to group related data from selected columns in a source table into a single column, called a pivot column, in a target table. You can also carry over columns from your source table to your target table without pivoting them.
The pivot column spans the same number of rows as there are selected columns. You can create more than one pivot column.
After you select your pivot columns, you specify the number of pivot groups that you want to use. The number of groups that you type must be a divisor of the number of pivot columns that you selected. For example, if you selected 6 pivot columns, you can type the number 1, 2, 3, or 6. If you selected 5 pivot columns, you can type 1 or 5 in this field.
You must also create a data group for the data in each selected pivot column in the source table. The data groups are used in the output table to describe or label information that is collected into a single column so that the original meaning of the data as expressed by the column labels in the input table is preserved in the output table.
Suppose a table has columns A, B, C, D, E, F, G and H. You decide that you want to carry over columns A and B and pivot columns C, D, E, F, G, and H. You also decide that you want to create 2 pivot groups. The total number of pivot columns divided by the number of pivot groups determines how many pivot columns will be in each pivot group. In this case, there are six pivot columns total and two pivot groups. This means that each pivot group will have three pivot columns.
You use the Advanced window for the Pivot Data transformer to create data groups and define your pivot column groups. For example:
Data Group |
Pivot Group 1 |
Pivot Group 2 |
Orange |
C |
D |
Purple |
E |
F |
Red |
G |
H |
The target table of a Pivot Data transformer is organized in cycles. A cycle contains rows of pivot data that is transformed from each row in your source. For example, suppose your source table has row 1. The intersection of this row with your source columns is represented as A1, B1, C1, D1, etc. The first cycle in your target table will look like this:
A (this is a carry-over column) |
B (this is a carry-over column) |
Data Group |
Pivot Group 1 |
Pivot Group 2 |
A1 |
B1 |
Orange |
C1 |
D1 |
A1 |
B1 |
Purple |
E1 |
F1 |
A1 |
B1 |
Red |
G1 |
H1 |
The number of rows in a cycle is equal to the number of rows in the table in the Advanced window.
The number of cycles in a table is equal to the number of rows in your source table. For example, if your source table has two rows, your target table will look like this:
A |
B |
Data Group |
Pivot Group 1 |
Pivot Group 2 |
A1 |
B1 |
Orange |
C1 |
D1 |
A1 |
B1 |
Purple |
E1 |
F1 |
A1 |
B1 |
Red |
G1 |
H1 |
A2 |
B2 |
Orange |
C2 |
D2 |
A2 |
B2 |
Purple |
E2 |
F2 |
A2 |
B2 |
Red |
G2 |
H2 |
If you don't select any pivot columns when you define your Pivot Data transformer, the transformer decides which columns to pivot and what data groups to create.
Columnar data in each pivot group must have either the same data type or data types that are related to each other through automatic promotion.
When you import a tag language file, the metadata that it contains is stored in the warehouse control database. The Data Warehouse Center will accept a tag language file that is not generated by another Data Warehouse Center system. For example, you can import a tag language file that is generated by an ETI product. Any object with an unknown OBJTYPE in a non-Data Warehouse Center tag language file will be ignored, and processing will continue. If an object is of a known OBJTYPE, but does not contain all the properties required by the Data Warehouse Center, an error will occur.
After you complete the export process, and a tag language file is created, the file is ready to import to a target system.
Before you the import the file, consider the following issues:
Naming considerations
If you are using the import utility to move a warehouse source from a test environment to a production environment, make sure that the production environment does not already have a warehouse source with the same warehouse source name unless you want to over-write the definition of the warehouse source.
If you import a step into a system that contains a step with the same name, then you must either delete the step that you want to overwrite, or change the step to development mode. Otherwise, the step cannot be updated and an error will occur.
Ordering the import of objects
You use a logical order to import objects. An object that is referred to but not defined in a tag language file must be defined in the destination warehouse control database. You can do this by first importing the object definition that is referred to in a separate tag language file, or you can define the object in the Data Warehouse Center tree view.
For example, you can import warehouse source definitions first. When the warehouse source definitions are ready for use, you can import subject areas, and warehouse target definitions, and any corresponding steps individually.
Requirement: Do not use the import function to migrate from a previous version of the Data Warehouse Center. You must always import and export tag language files using the same version of the Data Warehouse Center. For information about migrating from a previous version of the Data Warehouse Center, see DB2 UDB Quick Beginnings.
After a successful import, you receive a message with a return code of 0, and the Import Metadata window closes.
An imported subject area, warehouse source, or warehouse target will not be displayed in the tree view unless you belong to a warehouse group that has access to the object. A warehouse group that is associated with exported warehouse sources or warehouse targets must be related to a process that uses those objects; some warehouse groups might not be exported. If the warehouse groups that you want your warehouse sources and warehouse targets to be related to are not imported, you need to add the warehouse groups to the Data Warehouse Center system. Metadata for individual users is not exported. You must add users to the Data Warehouse Center.
The Data Warehouse Center assigns development mode to all imported steps so that you can verify that the steps are complete and accurate before you promote them to test or production mode.
Before you use imported objects:
Refresh the tree view. Update database information for target databases (database name, user ID, and password), if necessary. If any new agent site definitions were imported, their passwords also need to be updated. If you imported any programs that have parameters whose values are used as passwords, you must reset those values.
Add your user ID to a security group that has access to the imported subjects areas and warehouse targets.
Before you run imported steps:
Ensure that all the warehouse source and warehouse target databases that are referred to by the imported steps are defined and configured in your Windows system. Verify that all the definitions that you imported and change any parameters to match the new environment.
After you import a tag language file that includes steps that include joins, you might notice the tables are not placed as you expected in the Process Modeler or Warehouse Schema Modeler. Some information cannot be imported from one system to another. You will need to reposition your steps and star schemas using the Process or Warehouse Schema Modeler.
Promote steps as necessary.
Importing tag language files again
If you import an existing step again, it must be in development mode, or an error will occur. You must demote steps to development mode before importing them again.
Most tag language files can be imported more than once without errors. If a tag language file includes primary and foreign keys, the keys must be deleted before the file can be imported again.
If the names or order of columns in a table or file have changed (for example, a warehouse source or warehouse target table is redefined), it is a good idea to delete the warehouse source table, file, or step definition (including the target table in the warehouse target) from the target system before importing the same table or step again. At a minimum, you should verify that the column or field information is imported as you expected.
Replication is a process of maintaining a defined set of data in more than one location. It involves copying designated changes from one location (a source) to another (a target), and synchronizing the data in both locations. The source and target can be in logical servers (such as a DB2 database or a DB2 for OS/390 subsystem or data-sharing group) that are on the same machine or on different machines in a distributed network.
You can use the Data Warehouse Center to define a replication step, which will replicate changes between any DB2 relational databases. You can also use other IBM products (such as DB2 DataJoiner and DataPropagator NonRelational) or non-IBM products (such as Microsoft SQL Server and Sybase SQL Server) to replicate data between many database products--both relational and nonrelational.
The replication environment that you need depends on when you want data updated and how you want transactions handled. You might use a log-based or a trigger-based change-capture mechanism, or both. You also have the flexibility to choose the locations of the replication components to maximize the efficiency of your replication environment.
This step is intended for Data Warehouse Center users who previously used Visual Warehouse Version 5.2.
Use this step to run the Visual Warehouse 5.2 DB2 UDB REORG (VWPREORG) program in the Data Warehouse Center. This step runs the DB2 UDB REORG utility on a target table.
You schedule this step to run on the target table of a process, after the process completes.
The Visual Warehouse 5.2 DB2 UDB REORG program extracts the following step and warehouse source parameter values from the Process Modeler and your step definition:
The warehouse target database name
The user ID and password for the warehouse target
The target table that is defined for the step
These parameters are predefined. You do not specify values for these parameters.
The location of the log files for this program is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the REORG program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
REORG error. |
20 |
Internal error. |
28 |
VWS_LOGGING error. |
32 |
Trace file error. |
Before you can use the Clean Data transformer, you must create a table that contains rules for your clean type. A rules table designates the values that the Clean Data transformer will use during the find and replace process.
The rules table must be in the same database as the source and target tables.
At a minimum, a rules table must contain at least two columns. One column contains find values. The other column contains replace values. The rows in each column correspond to each other.
For example, Column 1 and Column 2 in a rules table have the values shown here:
Column 1 |
Column 2 |
Desk |
Chair |
Table |
Lamp |
Suppose that Column 1 contains the find values, and Column 2 contains the replace values. When you run the step, the Clean Data transformer searches your source column for the value Desk. Wherever it finds the value Desk, it writes the value Chair in the corresponding field of the target column.
The Clean Data transformer copies values that are not listed in the find column directly to the target table. In the example, the value Stool is not listed in the column that contains the find values. If the selected source column contains the value Stool, the Clean transformer will write Stool to the corresponding field in the target column.
For more information on rules tables, see the Data Warehouse Center Administration Guide.
The following table describes the columns that must be included in the rules table for each clean type:
Clean type |
Minimum number |
Purpose of columns |
Find and replace |
2 - 3 |
The first column contains find values. |
Discretize |
3 - 4 |
The first column contains the low end of the find value range. |
Clip |
3 - 5 |
The first column contains the low end of the find value range. |
Carry over |
None allowed |
Rule table not used. |
You can reorder the output columns using the Step notebook. You can change column names on the Column Mapping page of the Step notebook.
Use the Run FTP command file program (VWPFTP) program to run an FTP command file.
The Run FTP command file program can detect only major errors of the FTP command. Confirm the results of the transfer before you use the transferred data.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
X |
X |
Log information for program processing is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by this program:
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
FTP error. |
16 |
Internal error. |
32 |
Open Trace error
The program detected a trace error, such as the inability to create
or open the trace file. |
This step is intended for Data Warehouse Center users who previously used Visual Warehouse Version 5.2.
Use this step to run the Visual Warehouse 5.2 DB2 UDB RUNSTATS (VWPSTATS) program in the Data Warehouse Center. This step runs the DB2 UDB RUNSTATS utility on a target table.
You schedule this step to run on the target table of a process, after the process completes.
The Visual Warehouse 5.2 DB2 UDB RUNSTATS program extracts the following step and warehouse source parameter values from the Process Modeler and your step definition:
The warehouse target database name
The user ID and password for the warehouse target
The target table that is defined for the step
The location of the log files for this program is based on the value of the VWS_LOGGING environment variable.
These parameters are predefined. You do not specify values for these parameters.
The following table lists the error codes returned by the RUNSTATS program.
Value |
Description |
4 |
Warning. |
8 |
Parameter error. |
12 |
Connect error. |
16 |
Runstats error. |
20 |
Internal error. |
28 |
VWS_LOGGING error. |
32 |
Trace file error. |
Use the Show Related notebook to show the relationships between steps, sources, targets, and groups in a warehouse.
Only first level dependencies are shown in the table. For example, if you select a step as the target object and the step has a dependency on a source that has a dependency on another source, only the dependency on the step is shown. To see the dependency on the source, click mouse button 2 on that source object and select Show Related from the pop-up menu.
If you want to compare the relationships between several target objects and their related objects, you can open several Show Related notebooks from the Data Warehouse Center object tree.
There are several reasons to show related objects:
You want to know the structure of a process.
You want to drop or recreate an object and need to know all the dependency relationships on the object to be dropped or created.
Dropping an object that has a dependency might present problems. For example, if you drop a source table with dependent steps, those steps will be marked inoperative.
You use a subject area to logically group the processes (and the steps, warehouse sources, and warehouse targets within the processes) that are related to a particular topic or function.
For example, if you have a series of processes that move and transform sales data, you create a Sales subject area, and create the processes within the subject area. Similarly, you group Marketing processes under a Marketing subject area. You use the subject areas to select the processes that are of interest to you.
Use the Submit JCL Jobstream on MVS (VWPMVS) program to submit a JCL jobstream that resides on MVS to an MVS system for execution. The name of the MVS job must be the name of the MVS user ID followed by one alphanumeric character. This is a restriction of FTP/MVS.
Before you use the Submit JCL Jobstream on MVS program, test your JCL file by running it from TSO under the same user ID that you plan to use with the program.
This program runs successfully if the MVS host name, user ID, and password are correct. If you want to test the validity of the results generated by the JCL, you must write your own testing logic. If the FTP session times out, this program returns an FTP error, even if the JCL ultimately runs successfully on the MVS system.
The Submit JCL Jobstream on MVS program also receives the JES log file on the agent site. It erases the copy of the JES log from any previous jobs on the agent site before submitting a new job for processing. It also verifies that the JES log file is downloaded to the agent site after the job completes.
The Submit JCL Jobstream on MVS program requires TCP/IP 3.2 or later installed on MVS. Verify that the FTP service is enabled before using the program.
Supported operating systems |
Windows NT |
AIX |
Sun Solaris |
AS/400 |
OS/2 |
X |
X |
X |
X |
Log information for program processing is contained in the trcppp.log file.
Log information for the FTP command result is contained in the ftpppp.log file.
The location of the log files is based on the value of the VWS_LOGGING environment variable.
The following table lists the error codes returned by the Submit JCL
Jobstream on MVS program.
Value |
Description |
8 |
Parameter error.
Too many or too few parameters were supplied to the program. |
12 |
FTP error.
The Data Warehouse Center program detected an FTP error, such as the
inability to create the JES log file. |
16 |
Internal error.
The program detected an internal error, such as the inability to
erase a previously existing JES log file. |
32 |
Open Trace error The program detected a trace error, such as the inability to create or open the trace file. |
128 |
Missing DLL.
A Windows NT error that can occur when the program was compiled on
another operating system. |
When you import data from a Sybase Version 10/11 data source into a new information resource, the Data Warehouse Center converts the Sybase data to ODBC data types, as described in the following table:
Sysbase 10/11 data type |
ODBC data type |
CHAR |
CHAR |
INT |
INTEGER |
TINYINT |
TINYINT |
SMALLINT |
SMALLINT |
FLOAT |
FLOAT |
DOUBLE |
DOUBLE |
DECIMAL |
DECIMAL |
DATETIME |
TIMESTAMP |
SMALLDATETIME |
TIMESTAMP |
MONEY |
DECIMAL |
SMALLMONEY |
DECIMAL |
VARCHAR |
VARCHAR |
SYSNAME |
VARCHAR |
NUMERIC |
NUMERIC |
TEXT |
LONG VARCHAR |
BINARY |
CHAR (With Is Text field not checked) |
BIT |
BIT |
IMAGE |
LONGVARCHAR |
REAL |
REAL |
TIMESTAMP |
VARCHAR |
VARBINARY |
VARCHAR (With Is Text field not checked) |
After you have selected a source for your step and opened the Column Mapping page of the step notebook for the first time, you will see the ODBC data types used in the Column Grid. See the DB2 data types table to see characteristics of the ODBC data types that will be used for your business view table.
T-value is the observed value of the T-statistic used to test the hypothesis that two attributes are correlated. The T-value can range between -infinity and +infinity. A T-value near 0 is evidence for the null hypothesis that there is no correlation between the attributes. A T-value far from 0 (either positive or negative) is evidence for the alternative hypothesis that there is correlation between the attributes.
The T-statistic is defined as:
T = r * SQRT((n-2) / (1 - r*r))
where r is the correlation coefficient, n is the number of input value pairs, and SQRT is the square root function.
If the correlation coefficient, r, is either -1 or +1, the T-value will be represented by NULL. If the T-value is too small or too large to be represented by a number, the value will be represented by NULL.
A format for defining object types and objects, and actions to be taken on those object types and objects, in the Data Warehouse Center or the information catalog.
A file that contains tag language that describes objects and object types to be added, updated or deleted in the Data Warehouse Center or the information catalog when the file is imported. A tag language file is produced by exporting objects from the Data Warehouse Center or from the Information Catalog Manager.
In the Information Catalog Manager, you can also produce a tag language file by:
Transferring a delete history log.
Extracting descriptive data from another database system using an extract program.
A three-way ANOVA uses three grouping columns. The data assumptions for the three-way ANOVA are as follows:
There is one data column and three grouping columns, Column A, Column B, and Column C.
The data column is composed of data values (x1, x2, ..., xn).
The data column must contain equal sample sizes in each of the subgroups formed by the unique value triplets in the three grouping columns.
This sample size must be greater than 1. Unequal sample sizes are not allowed for the three-way ANOVA.
The following table is a sample of a three-way ANOVA summary table:
SOURCE |
DF |
SS |
MS |
FVALUE |
PVALUE |
Column A |
DFA |
SSA |
MSA |
FvalueA |
PValueA |
Column B |
DFB |
SSB |
MSB |
FvalueB |
PValueB |
Column C |
DFC |
SSC |
MSC |
FvalueC |
PValueC |
(Column A)x(Column B) |
DFAB |
SSAB |
MSAB |
FvalueAB |
PValueAB |
(Column A)x(Column C) |
DFAC |
SSAC |
MSAC |
FvalueAC |
PValueAC |
(Column B)x(Column C) |
DFBC |
SSBC |
MSBC |
FvalueBC |
PValueBC |
(Column A)x(Column B)x(Column C) |
DFABC |
SSABC |
MSABC |
FvalueABC |
PValueABC |
Within Groups |
DFE |
SSE |
MSE |
Null |
Null |
Total |
DFT |
SST |
Null |
Null |
Null |
Assume the values for the following variables:
n = The total number of data values in the data column.
a = The number of unique groups in Column A.
b = The number of unique groups in Column B.
c = The number of unique groups in Column C.
k = The number of unique subgroups (cells) forms by the unique value triplets in the three grouping columns. Note: k = (a*b*c)
m = The number of data values in each subgroup (cell) for all three grouping columns. m must be greater than 1.
x(i) = The ith data item in the data column.
avgGrpA(i) = The average of all the data values in the ith group of Column A.
avgGrpB(i) = The average of all the data values in the ith group of Column B.
avgGrpC(i) = The average of all the data values in the ith group of Column C.
avgGrpAB(ij) = The average of all the data values in the ijth subgroup for the unique value pairs from Column A and Column B.
avgGrpAC(ij) = The average of all the data values in the ijth subgroup for the unique value pairs from Column A and Column C.
avgGrpBC(ij) = The average of all the data values in the ijth subgroup for the unique value pairs from Column B and Column C.
avgGrpABC(ijk) = The average of all the data values in the ijkth subgroup for the unique value triplets from Column A, Column B and Column C.
avgTotal = The average of all the data values in the data column.
The equations for the calculated values are:
DFA = a-1
DFB = b-1
DFC = c-1
DFAB = (a-1)*(b-1)
DFAC = (a-1)*(c-1)
DFBC = (b-1)*(c-1)
DFABC = (a-1)*(b-1)*(c-1)
DFE = n - k
DFT = n - 1
Note: DFT = DFA + DFB + DFC + DFAB + DFAC + DFBC + DFABC + DFE
SSA = (m * b *c) * SUM(i = 1 to a) [ ( avgGrpA(i) - avgTotal) ** 2 ]
SSB = (m * a * c) * SUM(i = 1 to b) [ ( avgGrpB(i) - avgTotal) ** 2]
SSC = (m * a * b) * SUM(i = 1 to c) [ ( avgGrpC(i) - avgTotal) ** 2]
SSAB = (m * c ) * SUM(i = 1 to a) SUM( j = 1 to b) [ ( avgGrpAB(ij) - avgGrpA(i) - avgGrpB(j) + avgTotal) ** 2 ]
SSAC = (m * b ) * SUM(i = 1 to a) SUM( j = 1 to c) [ ( avgGrpAC(ij) - avgGrpA(i) - avgGrpC(j) + avgTotal) ** 2 ]
SSBC = (m * a ) * SUM(i = 1 to b) SUM( j = 1 to c) [ ( avgGrpBC(ij) - avgGrpB(i) - avgGrpC(j) + avgTotal) ** 2 ]
SSABC = (m ) * SUM(i = 1 to a) SUM( j = 1 to b) SUM( k = 1 to c)
[ ( avgGrpABC(ijk) - avgGrpAB(ij) - avgGrpAC(ik) - avgGrpBC(jk) + avgGrpA(i) + avgGrpB(j) + avgGrpC(k) - avgTotal) ** 2 ]
SSE = SST - ( SSA + SSB + SSC + SSAB + SSAC + SSBC + SSABC)
SST = SUM( i= 1 to n ) [ (x(i) - avgTotal) ** 2]
MSA = SSA / DFA
MSB = SSB / DFB
MSC = SSC / DFC
MSAB = SSAB / DFAB
MSAC = SSAC / DFAC
MSBC = SSBC / DFBC
MSABC = SSABC / DFABC
MSE = SSE / DFE
FValueA = MSA / MSE
FValueB = MSB / MSE
FValueC = MSC / MSE
FValueAB = MSAB / MSE
FValueAC = MSAC / MSE
FValueBC = MSBC / MSE
FValueABC = MSABC / MSE
PValueA = PValueFunction( DFA, DFE, FValueA )
PValueB = PValueFunction( DFB, DFE, FValueB )
PValueC = PValueFunction( DFC, DFE, FValueC )
PValueAB = PValueFunction( DFAB, DFE, FValueAB )
PValueAC = PValueFunction( DFAC, DFE, FValueAC )
PValueBC = PValueFunction( DFBC, DFE, FValueBC )
PValueABC = PValueFunction( DFABC, DFE, FValueABC )
Transient steps are temporary steps that process intermediate data within the warehouse. They use data that is output from earlier steps and produce data that is used by later steps. When all dependent steps have run successfully, the data for transient steps is automatically purged.
Transient steps appear in the Work in Progress window, but not in the Run New Step window. Task flows are used to hook up a transient step between two other steps.
A two-way ANOVA uses two grouping columns. The data assumptions for the two-way ANOVA are:
The data column is composed of data values (x1, x2, ..., xn).
The data column must contain equal sample sizes in each of the subgroups that are formed by the unique value pairs in the two grouping columns.
The sample size must be greater than 1. Unequal sample sizes are not allowed for the two-way ANOVA.
The following table is a sample of a two-way ANOVA summary table:
SOURCE |
DF |
SS |
MS |
FVALUE |
PVALUE |
Column A |
DFA |
SSA |
MSA |
FvalueA |
PvalueA |
Column B |
DFB |
SSB |
MSB |
FValueB |
PvalueB |
(Column A)x(Column B) |
DFAB |
SSAB |
MSAB |
FValueAB |
PvalueAB |
Within Groups |
DFE |
SSE |
MSE |
Null |
null |
Total |
DFT |
SST |
null |
Null |
null |
Assume the values for the following variables:
n = The total number of data values in the data column
a = The number of unique groups in Column A.
b = The number of unique groups in Column B.
k = The number of unique subgroups (cells) formed by the unique value pairs in the two grouping columns. Note: k = (a*b)
m = The number of data values in each subgroup (cell). m must be greater than 1.
x(i) = The ith data value in the data column.
avgGrpA(i) = The average of all the values in the ith group of Column A.
avgGrpB(i) = The average of all the values in the ith group of Column B.
avgGrpAB(ij) = The average of all the values in the ij-th subgroup (cell) for the unique value pairs from Column A and Column B.
avgTotal = The average of all the data values in the data column.
The equations for the values calculated are:
DFA = a-1
DFB = b-1
DFAB = (a-1)*(b-1)
DFE = n-k
DFT = n-1 (Also note: DFT = DFA + DFB + DFAB + DFE)
SSA = (m*b) * SUM(i=1 to a) [ (avgGrpA(i) - avgTotal)**2 ]
SSB = (m*a) * SUM(i=1 to b) [ (avgGrpB(i) - avgTotal)**2 ]
SSAB = (m) * SUM(i=1 to a)SUM(j= 1 to b) [ (avgGrpAB(ij) - avgGrpA(i) - avgGrpB(j) + avgTotal)**2 ]
SSE = SST - (SSA + SSB + SSAB)
SST = SUM(i=1 to n) [ (x(i) - avgTotal)**2 ]
MSA = SSA / DFA
MSB = SSB / DFB
MSAB = SSAB / DFAB
MSE = SSE / DFE
FValueA = MSA / MSE
FValueB = MSB / MSE
FValueAB = MSAB / MSE
PValueA = PValueFunction( DFA, DFE, FValueA)
PValueB = PValueFunction( DFB, DFE, FValueB)
PValueAB = PValueFunction( DFAB, DFE, FValueAB)
Warehouse primary keys and warehouse foreign keys are definitions that you create to describe the underlying database table keys.
A key for a table is one or more columns that uniquely identify the table. The warehouse primary key for a table is one of the possible keys for the table that you define as the key to use. A warehouse foreign key is a column in a table whose allowable values must exist in the warehouse primary key in another table. The table that contains the warehouse foreign key is called a dependent table; the table that contains the warehouse primary key is called a parent table.
For example, you have a table that defines the departments in your organization and another table that defines the employees in your organization. Each department is identified by a unique department number, and each employee is a member of a department.
You define the Department Number column as the warehouse primary key of the Department table. You define the Department Number column of the Employee table as a warehouse foreign key, which references the Department Number column (warehouse primary key) of the Department table. This structure ensures that any department numbers that are entered in the employee table are defined in the Department table.
For more information about this example, see IBM Universal Database SQL Getting Started.
You can use warehouse primary keys and warehouse foreign keys to support business rules that are helpful in the creation of a warehouse. For example, you can use warehouse primary keys and warehouse foreign keys to predefine a join between two tables. Using the example of the Department table and the Employee table, you could join the Department and Employee tables, using the Department Number column, to obtain a table that contains the description of the department for each employee as well as the employee number.
In the Data Warehouse Center, you use warehouse primary keys and warehouse foreign keys to predefine a star join between source tables in a step.
You can import warehouse primary keys and warehouse foreign keys or manually define new warehouse primary keys and warehouse foreign keys to use in the Data Warehouse Center only. The Data Warehouse Center does not alter the underlying database table if you define new keys. The Data Warehouse Center verifies that any keys that you define are consistent within the Data Warehouse Center, but it does not verify consistency with the underlying database tables.
You can also define warehouse primary keys and warehouse foreign keys in the target table for users or to join the target table with source tables if you use it as a source. When you define warehouse primary keys and warehouse foreign keys for a target table, the Data Warehouse Center alters the target table to include the warehouse primary keys, but not the warehouse foreign keys.
A warehouse schema is a specialized design that consists of multiple dimension tables, which describe aspects of a business, and one fact table, which contains the facts about the business. For example, if you have a mail-order business selling books, some dimension tables are customers, books, catalogs, and fiscal years. The fact table contains information about the books that are ordered from each catalog by each customer during the fiscal year. A join relationship supports the business rules that you use in creating your warehouse. In the Data Warehouse Center, a join relationship is a query on a warehouse schema, which includes the fact tables and some dimension tables. The join relationship is based on the primary and foreign keys defined for the source tables, as well as non-keyed columns. The join relationship joins the columns of the dimension tables with the columns in the fact table.
The primary and foreign keys that can be used in join relationships do not require actual database keys. You can use primary and foreign keys that you imported from DB2 Universal Database or you can define new primary and foreign keys in the Data Warehouse Center. These keys are called warehouse primary and warehouse foreign keys because they are defined in the warehouse metadata, not the physical database. You can use these keys to pre-define joins in the Data Warehouse Center. In the previous example, the customers table has a primary key of customer number, and each book has a primary key of its book number (ISBN). Each order in the fact table includes foreign keys of customer number and book number. The join relationship combines information about the customers and books with the orders.
To define a join relationship in the Data Warehouse Center, first add the tables to the warehouse schema using the table icon in the pallet. You can add tables from one database (or more, if the Warehouse Schema Properties notebook does not show that the tables can only come from one database). Autojoin will join the tables based on primary and foreign key relationships. The join icon on the pallet will allow other column joins between tables.
Warehouse sources are sources that provide data to Data Warehouse Center steps. The sources can be nearly any relational or nonrelational source (table, view, or file) that has connectivity to your warehouse.
To define relational or nonrelational data to your warehouse, you must define a warehouse source using the Warehouse Source notebook. To define a warehouse source, you must:
Acquire proper password and authority to the source.
Establish connectivity to the agent site with ODBC drivers (unless your source database resides on an AS/400). See the Data Warehouse Center Administration Guide.
The Data Warehouse Center supports warehouse source tables that use ordinary SQL identifiers. An ordinary identifier:
Must start with a letter.
Can include uppercase letters, numbers, and underscore characters.
Cannot be a reserved word.
If a table has a lowercase letter as part of its ordinary identifier, the Data Warehouse Center stores the lowercase letter as an uppercase letter.
The Data Warehouse Center does not support warehouse source tables that use delimited identifiers. A delimited identifier:
Is enclosed within double quotation marks.
Can include uppercase and lowercase letters, numbers, underscore characters, and spaces.
Can contain a double quotation mark, which is represented by two consecutive quotation marks.
The Data Warehouse Center can use the following kinds of sources as warehouse sources (see the appropriate installation book for the versions of non-IBM databases that are supported):
Relational data
DB2 UDB for AS/400 (CISC) - IBM relational database for CISC-based AS/400 systems
DB2 UDB for AS/400 (RISC) - IBM relational database for RISC-based AS/400 systems
DB2 UDB for OS/390 - IBM relational database for the OS/390 platform
DB2 UDB for AIX - IBM relational database for AIX systems
DB2 UDB for HP-UX - IBM relational database for HP-UX systems
DB2 UDB for Linux - IBM relational database for the Linux platform
DB2 UDB for OS/2 - IBM relational database for the OS/2 platform
DB2 UDB for SCO UnixWare - IBM relational database for SCO UnixWare systems
DB2 UDB for PTX - IBM relational database for PTX systems
DB2 UDB for Sun Solaris - IBM relational database for the Sun Solaris platform
DB2 UDB for Windows NT- IBM relational database for Windows NT
DB2 UDB for VM - IBM relational database for VM systems
DB2 UDB for VSE - IBM relational database for VSE systems
DB2 UDB EEE - the Enterprise - Extended Edition of DB2 UDB
DataJoiner - IBM database middleware that accesses, joins, and updates data from different sources
DB2 Family - includes all non-UDB versions of DB2
Informix - Informix relational database systems
Generic ODBC - applies to ODBC resources not included with the Data Warehouse Center, such as Microsoft Excel and other Microsoft compliant ODBC drivers.
Oracle - Oracle relational database systems
Sybase - Sysbase relational database systems
Microsoft SQL Server - Microsoft SQL Server relational database systems
Nonrelational data
IBM IMS and VSAM databases - IBM IMS and VSAM database systems
Local files - plain text files that contain character-delimited, variable-length data on a local workstation
Remote files - plain text files that contain character-delimited, variable-length data on a remote workstation
Warehouse targets are tables or files that contain the data that results from Data Warehouse Center steps. Like a warehouse source, warehouse targets can also provide data to Data Warehouse Center steps.
A warehouse target can be defined on any of the following databases or files:
DB2 UDB for AS/400 (CISC) - IBM relational database for CISC-based AS/400 systems
DB2 UDB for AS/400 (RISC) - IBM relational database for RISC-based AS/400 systems
DB2 UDB for OS/390 - IBM relational database for the OS/390 platform
DB2 UDB for AIX - IBM relational database for AIX systems
DB2 UDB for HP-UX - IBM relational database for HP-UX systems
DB2 UDB for Linux - IBM relational database for the Linux platform
DB2 UDB for OS/2 - IBM relational database for the OS/2 platform
DB2 UDB for SCO UnixWare - IBM's relational database for SCO UnixWare systems
DB2 UDB for PTX - IBM relational database for PTX systems
DB2 UDB for Sun Solaris - IBM relational database for the Sun Solaris platform
DB2 UDB for Windows NT - IBM's relational database for Windows NT
DB2 UDB EEE - the Enterprise - Extended Edition of DB2 UDB
DataJoiner - IBM database middleware that accesses, joins, and updates data from different sources
DB2 Family - includes all non-UDB versions of DB2
Local files - plain text files that contain character delimited, variable length data residing on a local workstation
Warehouse target databases and files are the portion of your enterprise data warehouse that contains cleansed, transformed data that is ready for business analysis. Data warehouse users must have access to the warehouse target database.
In the Data Warehouse Center, warehouse targets are typically organized by subject area. For example, if your subject area deals with your company's sales results, the warehouse targets within it could contain sales data broken down by region.
A named data object that is associated with a warehouse target. A warehouse target table contains the results of steps after they run. You can create warehouse target tables with either the Control Center or the Data Warehouse Center. You use the Define Warehouse Target Table notebook to define the table to the Data Warehouse Center and to associate it with a particular warehouse target.