Data Warehouse Center concepts

A

ANOVA transformer

The Analysis of Variance (ANOVA) transformer produces statistical calculations in two tables based on a small number of parameters. There are three types of ANOVA transformers: one-way ANOVA, two-way ANOVA, and three-way ANOVA.

The ANOVA transformer obtains two independent estimates of variance. The first estimate is based on variability between groups. The second estimate is based on variability within groups. After the ANOVA transformer computes these estimates, it calculates their ratio. The significance of this ratio is described by a family of distributions, the Fisher-F distributions.

The ANOVA transformer also calculates the p-value. The p-value is the probability that the means of the two groups are equal. A small p-value leads to the conclusion that the means are different. For example, a p-value of 0.02 means that there is a 2% chance that the sample means are equal. Likewise, a large p-value does not lead to the conclusion that the means of the two groups are different.

The ANOVA transformer writes to two target tables, which you name. In the following example, table 1 is called the ANOVA Summary table and table 2 is called the ANOVA Statistics table.

The ANOVA Summary table contains the following columns:

Column number

Column name

Data type

Description

SOURCE

VARCHAR

Source of the ANOVA statistic

DF

Integer

Degrees of freedom

SS

Double

Sum of squares

MS

Double

Mean sum of squares

FVALUE

Double

F-value or F-ratio

PVALUE

Double

Calculated P-value for the F-value and the degrees of freedom

The ANOVA Statistics table contains the following columns:

Column number

Column name

Data type

Description

VARIABLE

VARCHAR

Name of the variable in the grouping column that produces the statistic

COUNT

Integer

Count of the data values

SUM

Integer

Sum of the data values

AVERAGE

Double

Average of the data values

VARIANCE

Double

Variance of the data values

STDDEV

Double

Standard deviation of the data values

B, C

Calculate Statistics

The Calculate Statistics transformer calculates the following descriptive statistics on any number of data columns from a single table:

Statistic

Input data type

Output data type

Count¹

Any data type

Large integer (COUNT)

Sum

A number²

Same as the data type of the input except:
Large integer if input is small integer;
double-precision floating point
if the input is single-precision
floating point.
The sum must be within range of the
data type of the result.

Average

A number²

Same as SUM, except:
The average must be within range of
the data type of the result.

Variance

A number²

Double-precision floating point.

Standard deviation

A number²

Double-precision floating point.

Standard error

A number²

Double-precision floating point.

Minimum

A number³

Same as the data type of the input.

Maximum

A number³

Same as the data type of the input.

Range

A number²

Same as the data type of the input.

Coefficient of variation

A number²

Double-precision floating point.

The input and output data types are based on the input specification for DB2 UDB SQL functions.

¹DB2 UDB provides a function called COUNT_BIG which results in a decimal with precision 31 and scale 0. COUNT_BIG is not supported by the Calculate Statistics transformer.

²A number can be any of these types: SMALLINT, INTEGER, REAL, DOUBLE, FLOAT, DECIMAL, or NUMERIC.
³DB2 UDB provides MIN and MAX functions which can be used for any built-in type other than a long string. The Calculate Statistics transformer MIN and MAX functions will operate only on numbers.

canvas

The work area at the right side of the Process Model window. Add sources, targets, and steps to this area to define how data in the open process is to be moved and transformed.

Clean Data transformer

You can use the Clean Data transformer to perform any of the following actions:

Replace missing, invalid, or inconsistent values in selected data columns with appropriate substitute values.
Remove unsuitable data rows.
Clip numeric values.
Perform numeric discretization.

You can choose to ignore case and white space during find comparisons, or to allow a tolerance value for numeric find comparisons.

Each clean transformation uses one of four clean types:

Find and replace - Performs basic find and replace functions.
Discretize - Performs find and replace functions within a range of values.
Clip - Performs find and replace functions within a range of values or outside of a range of values.
Carry over - Specifies columns in the input table to copy to the output table.

Control metadata

Metadata that describes changes to objects in the warehouse. Examples of control metadata are the date and time that a table is updated by the processing of a step.

Copy file using FTP (VWPRCPY)

Use the Copy file using FTP (VWPRCPY) Data Warehouse Center program to copy files on the agent site to and from a remote host.

Before you copy files to MVS, you must allocate their data sets.

You cannot transfer VSAM data sets.

This program uses the following step and source parameters:

Other site
GET/PUT
Remote user ID
Remote password
Remote file
Local file
Binary or character transfer (0=Binary, 1=Character)

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

X

X

Log information for program processing is contained in the trcppp.log file.

Log information for the FTP command result is contained in the ftpppp.log file.

The location of the log files is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by this program. The program can detect only major errors of the FTP command. Confirm the results of the transfer before you use the transferred data.

Value

Description

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

FTP error.
The program detected an FTP error.
For information about the FTP error, see the ftpppp.log file.

16

Internal error.
The program detected an internal error.

32

Open Trace error
The trace file that was generated by the program cannot be opened.

128

Missing DLL.
This is a Windows NT error that can occur when the program was compiled on another operating system.

Correlation transformer

Use the Correlation transformer to determine the extent to which changes in the value of an attribute (such as length of employment) are associated with changes in another attribute (such as salary). The data for a correlation analysis consist of two input columns; each column contains values for one of the attributes of interest. The Correlation transformer can calculate various measures of association between the two input columns. You can select more than one statistic to calculate for a given pair of input columns.

The data in the input columns also can be treated as a sample obtained from a larger population and the Correlation transformer can be used to test whether the attributes are correlated in the population. In this context, the null hypothesis asserts that the two attributes are not correlated, and the alternative hypothesis asserts that the attributes are correlated.

The Correlation transformer calculates any of the following correlation-related statistics on any number of input column pairs:

Correlation coefficient r
Covariance
T-value
P-value

All Correlation-related statistics require numeric input data types and produce double precision floating points as the output data type. A numeric data type is any one of the following data types: SMALLINT, INTEGER, REAL, DOUBLE, FLOAT, DECIMAL, or NUMERIC.

Correlation coefficient r

The correlation coefficient r (also known as the Pearson product-moment correlation coefficient) is a measure of the linear relationship between two attributes or columns of data. The value of r can range from -1 to +1 and is independent of the units of measurement. A value of r near 0 indicates little correlation between attributes; a value near +1 or -1 indicates a high level of correlation.

When two attributes have a positive correlation coefficient, an increase in the value of one attribute indicates a likely increase in the value of the second attribute. A correlation coefficient of less than 0 indicates a negative correlation; that is, when one attribute shows an increase in value, the other attribute tends to show a decrease.

Consider two variables x and y:

If r = 1, then x and y are perfectly positively correlated. The possible values of x and y all lie on a straight line with a positive slope in the (x, y) plane.
If r = 0, then x and y are not correlated. They are not linearly associated with each other. This does not mean, however, that x and y are statistically independent.
If r = -1, then x and y are perfectly negatively correlated. The possible values of x and y all lie on a straight line with a negative slope in the (x, y) plane.

The correlation coefficient is independent of the units of measurement.

Covariance

Covariance is a measure of the linear relationship between two attributes or columns of data. The value of the covariance can range from -infinity to +infinity. However, if the value of the covariance is too small or too large to be represented by a number, the value will be represented by NULL.

Unlike the correlation coefficient, the covariance is dependent on the units of measurement. For example, measuring values of two attributes in inches rather than feet.

D

Data export with ODBC to file (VWPEXPT2)

Use the Data export with ODBC to file (VWPEXPT2) program to select data in a table that is contained in a database registered in ODBC, and write the data to a delimited file. To run this program on AIX or UNIX, use the ODBC version of the Data Warehouse Center agent.

The Data export with ODBC to file program uses the following step and source parameters:

The ODBC connect string that was defined for the source database. The format is DSN (database); UID (user ID); PWD (password).

Only the following ODBC data types are supported: CHAR, VARCHAR, LONG VARCHAR, DATE, TIME, TIMESTAMP, INTEGER, SMALLINT, TINYINT, DECIMAL, NUMERIC, REAL, FLOAT, DOUBLE.

BINARY and GRAPHIC data are not supported.

The output file that is selected for the program in the process. This file will contain the exported data.
The SELECT statement that is defined for the step in the Steps notebook. The SELECT statement is enclosed in quotation marks.
Optional: The column separation character.

The output file is generated on the agent site.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

X

Log information for the export is stored in the trcppp.log file. The location of the log file is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the Data export with ODBC to file program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an error in connecting to the database.

16

Export error.
The program detected an user error during export, such as incorrect syntax in the select statement.

20

Internal error.
The program detected an internal error.

28

VWS_LOGGING error.
The VWS_LOGGING environment variable is not set.

32

Trace file error.
The trace file could not be opened.

128

Missing DLL.
This is a Windows NT error that can occur when the program was compiled on another operating system.

Data Warehouse Center security

Security for the Data Warehouse Center is separate from both database or operating system security. The Data Warehouse Center security system provides you with the ability to control user access to Data Warehouse Center objects and the actions that users can perform. For example, you can use the security system to restrict access to some types of warehouse source or target data, such as personnel information.

The elements of Data Warehouse Center security are privileges, warehouse groups, and warehouse users. Users belong to groups, which have access to objects and some combination of privileges.

Privileges

A privilege is an authorization to perform a particular function. In the Data Warehouse Center, two privileges can be assigned to groups: administration privilege and operations privilege. Privileges are assigned (by a user with administration privilege) to groups. In order for warehouse users to have a privilege, they must belong to a warehouse group that has the privilege.

Privilege

Action

Administration

Define and change warehouse users and warehouse groups, change Data Warehouse Center properties, import metadata, and define which groups have access to objects when they are created.

Operations

Open and use all the functions in the Work in Progress window.

Warehouse groups

The main object in Data Warehouse Center security is the warehouse group. When you define a warehouse group, you specify the sources, targets, and processes that the group can access, the warehouse users who will belong to the group, and the privileges that the users in the group will have. Group membership controls warehouse user access to objects, so that by including a user in a particular group you can grant access to only the objects that are needed for the user to perform his tasks.

During installation, a default warehouse group is created, and the user ID for the installation is assigned to this group. The first person who logs on to the Data Warehouse Center becomes a member of the default warehouse group and has all privileges. All users, sources, targets, and processes automatically belong to the default warehouse group when they are defined. If no other warehouse groups are added, all new objects will belong to the default warehouse group. If you do not want to use the security features of the Data Warehouse Center, you do not need to change anything; just keep the default warehouse group assignment.

When warehouse users define a new object, such as a warehouse source, the groups to which they belong will have access to the new object. The list of groups to which they can assign access is limited to the groups to which they belong.

The list of tables or views that users can define to the source will be limited by their group membership as well, so that they will be able to choose from among only the tables and views to which they have access. Further, the set of actions available to the user through the Data Warehouse Center will be limited by the level of security that the user has. For example, a user will not be able to access the properties of an object if the user does not belong to a group that has access to the object.

When a user defines a new object to the Data Warehouse Center and does not have administration privilege, all of the groups to which the user belongs will have access to the new object by default. The Security page of the object notebook will not be available to the user.

If a user has access to a warehouse source, but does not have administration privilege, the Security Groups page of the Warehouse Source notebook is not available.

The following table shows the relationship between various Data Warehouse Center objects and the group membership that is required to define or edit the object. Users who belong to a group with access to a process can update the steps in the process as well as the tables in the process.

Data Warehouse Center object

Who can define

Who can edit

Subject area

Any user

Any user

Process

Any user

Only users who belong to a group that has access to the process

Step in a process

Only users who belong to a group that has access to the process

Only users who belong to a group that has access to the process

Table, view, or file in a process

Only users who belong to a group that has access to the process

Only users who belong to a group that has access to the process

Warehouse source

Any user

Only users who belong to a group that has access to the warehouse source

Warehouse target

Any user

Only users who belong to a group that has access to the warehouse target

Tables, views, and files within a warehouse source

Only users who belong to a group that has access to the warehouse source

Only users who belong to a group that has access to the warehouse source

Tables and files within a warehouse target

Only users who belong to a group that has access to the warehouse target

Only users who belong to a group that has access to the warehouse target

Schemas

Any user

Any user

Tables in a schema

-

Only users who belong to a group that has access to the warehouse source or target that contains the tables

Warehouse agents

Any user

Any user

Programs group

Any user

Any user

Program definition within a program group

Any user

Any user

Warehouse groups

Only users who belong to a group that has administration privilege

Only users who belong to a group that has administration privilege

Warehouse users

Only users who belong to a group that has administration privilege

Only users who belong to a group that has administration privilege

Data Warehouse Center properties notebook

-

Only users who belong to a group that has administration privilege

Warehouse users

When you define a warehouse user to the Data Warehouse Center, you can assign a Data Warehouse Center logon user ID and password to the user as well as the groups to which the user will belong. All warehouse users automatically belong to the default warehouse group. To define a warehouse user, you must belong to a group that has administration privilege.

A user can be assigned to any number of warehouse groups. All warehouse users can view all objects in the Data Warehouse Center tree and contents pane. All warehouse users can always use the Locate, Show Related, and Refresh functions for any object. All warehouse users can update their own user ID and password by opening their User Properties notebook. You can specify that a user will receive e-mail notification when a step completes.

DB2 data types

The Data Warehouse Center uses the following default values for each data type that it supports.

The following table shows the column types and defaults - ODBC types for DB2

Data type

Size

Scale

Is null

Is text

CHAR

user defined (10)

none

true

true

VARCHAR

user defined (10)

none

true

true

Decimal

user defined (5,2)

yes

true

false

Numeric

user defined (5,2)

yes

true

false

Small int

fixed (2)

none

true

false

Integer

fixed (4)

none

true

false

Real

fixed

none

true

false

Date

fixed (6)

none

true

false

Time

fixed (6)

none

true

false

Timestamp

fixed (16)

none

true

false

Double

fixed (8)

none

true

false

Float

fixed (8)

none

true

false

Long VARCHAR DB2 UDB for AS/400

fixed (32740)

none

true

false

Long VARCHAR DB2 Common Server

fixed (32700)

none

true

true

CLOB

user defined (0)

none

true

true

The following table lists DBCS types. The DBCS fields are half of the size of comparable SBCS fields.

Data type

Size

Scale

Is null

Is text

DBCLOB

user defined (0)

none

true

true

GRAPHIC

user defined (0)

none

true

true

VARGRAPHIC

user defined (0)

none

true

true

Long VARGRAPHIC DB2 Common Server

fixed (16350)

none

true

true

Long VARGRAPHIC DB2 UDB for AS/400

fixed (16370)

If you want the Data Warehouse Center to automatically add the DISTINCT keyword to the SQL statement or generate a GROUP BY clause, the column length must be less than or equal to 254.

The Data Warehouse Center adds the VWEDITION column if the step has one or more editions. VWEDITION is a fixed integer column with a length of 4. The Data Warehouse Center removes the VWEDITION column if the number of editions is changed to zero. You can change the number of editions in development mode only.

Some target databases might not support all data types. See the documentation for your database type.

DB2 UDB for AS/400 Load Insert (VWPLOADI)

Use the DB2 UDB for AS/400 Load Insert (VWPLOADI) program to load data from a flat file into a DB2 UDB for AS/400 table, appending to existing data.

Before the program loads new data into the table, it exports the table to a backup file, which you can use for recovery.

The DB2 UDB for AS/400 Load Insert program uses the following step and warehouse parameters:

Source file name
Target table
Backup file
FileMod string

Restrictions

The Data Warehouse Center definition for the agent site that is running the program must include a user ID and password. The database server does not need to be on the agent site. However, the source file must be on the database server. Specify the fully qualified name of the source files as defined on the DB2 server.

If the program detects a failure during processing, the table will be emptied. If the load generates warnings, the program returns as successfully completed.

You can find log information in /QIBM/USERDATA/VWxxxxxx.VWPLOADI.

The location of the log files is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the DB2 UDB for AS/400 Load Insert program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an error in connecting to the database.
For more information about the error, see the sqlppp.log file.

16

Load error.
The program detected an error during load processing.
For more information about the error, see the sqlppp.log file.

20

Export error.
The program detected an error during backup of the original table.
For more information about the error, see the sqlppp.log file.

24

Internal error.
The program detected an internal error.

28

VWS_LOGGING error.
The VWS_LOGGING environment variable is not set.

32

Trace file error.
The trace file could not be opened.

128

Missing DLL.

A Windows NT error that that can occur when the program was compiled on another operating system.

DB2 UDB for AS/400 Load Replace (VWPLOADR)

Use the DB2 UDB for AS/400 Load Replace (VWPLOADR) program to load data from a flat file into a DB2 UDB database, replacing existing data.

The DB2 UDB for AS/400 Load Replace program uses the following step and warehouse parameters:

Source file name
Target table
FileMod string

Restrictions

If the program detects a failure during processing, the table is emptied. If the load generates warnings, the program returns as successfully completed.

You can find log information in /QIBM/USERDATA/VWxxxxxx.VWPLOADR.

The location of the log files is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the DB2 UDB for AS/400 Load Replace program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an error in connecting to the database. For more information about the error, see the sqlppp.log file.

16

Load error.
The program detected an error during load processing.
For more information about the error, see the sqlppp.log file.

20

Internal error.
The program detected an internal error.

28

VWS_LOGGING error.
The VWS_LOGGING environment variable is not set.

32

Trace file error.
The trace file could not be opened.

128

Missing DLL.
This is a Windows NT error that can occur when the program was compiled on another operating system.

DB2 UDB Load Insert (VWPLOADI) for Visual Warehouse 5.2 programs

Use the Visual Warehouse 5.2 DB2 UDB Load Insert (VWPLOADI) program to run your Visual Warehouse 5.2 VWPLOADI program in the Data Warehouse Center.

Before the program loads new data into the table, it exports the table to a backup file, which you can use for recovery.

The Visual Warehouse 5.2 DB2 UDB LOADI version of this program uses the following step and warehouse parameters:

The flat file selected as a source for the step. The step must have only one selected source file. The source file must contain the same number and order of fields as the target tables. Only delimited ASCII (ASCII DEL) source files are supported. For information about the format of delimited files, see the DB2 Command Reference.
The warehouse target database selected when the step was created. You must have either SYSADM or DBADM authorization to the DB2 database. The DB2 UDB Load Insert program does not support multinode databases. For multinode databases, use Load flat file into DB2 UDB EEE (VWPLDPR) for DB2 UDB Extended Enterprise Edition.
The user ID and password that are defined for the warehouse database in the Warehouse notebook.
The target table that is selected for the step.
The backup directory and file name.
The FileMod string (optional). See the DB2 UDB documentation for the LOAD command for the file mode options that can be used with the ASCII DEL format.

Recommendation

Create the target table in its own private DB2 tablespace. Any private tablespace that you create will be used by default for all new tables that do not specify a tablespace. If processing fails, DB2 might put the whole tablespace in hold status, making the tablespace inaccessible. To avoid this hold problem, create a second private tablespace for steps that do not use the load programs.

To create a tablespace:

CREATE TABLESPACE tablespace-name MANAGED BY SYSTEM USING ('d:/directory')

where directory is the directory that is to contain the databases. DB2 creates the directory for you.

After you create the tablespaces, open the Steps notebook for each step, and specify which tablespace the step uses.

Restrictions

The Data Warehouse Center definition for the agent site that is running the program must include a user ID and password. The DB2 load utility cannot be run by a user named SYSTEM. Be sure to select the same agent site in the warehouse source and the warehouse target for the step using the Data Warehouse Center program. The database server does not need to be on the agent site. However, the source file must be on the database server. Specify the fully qualified name of the source files as defined on the DB2 server.

If the program detects a failure during processing, the table will be emptied. If the load generates warnings, the program returns as successfully completed.

The program does not collect database statistics. Run the DB2 UDB Runstats program after a sizable load is complete.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

X

X

You can find log information in the following files:

trcppp.log contains log information for the DB2 Load utility.
sqlppp.log contains log information for the DB2 Load utility.
sqlppp.lbk contains log information for the backup of the original database.
sqlppp.lg2 contains recovery information if an error occurs during load processing.

The location of the log files is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the DB2 UDB load insert program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an error in connecting to the database.
For more information about the error, see the sqlppp.log file.

16

Load error.
The program detected an error during load processing.
For more information about the error, see the sqlppp.log file.

20

Export error.
The program detected an error during backup of the original table.
For more information about the error, see the sqlppp.log file.

24

Internal error.
The program detected an internal error.

28

VWS_LOGGING error.
The VWS_LOGGING environment variable is not set.

32

Trace file error.
The trace file could not be opened.

128

Missing DLL.

A Windows NT error that can occur when the program was compiled on another operating system.

DB2 UDB Load Replace (VWPLOADR) for Visual Warehouse 5.2 programs

Use the Visual Warehouse 5.2 DB2 UDB Load Replace (VWPLOADR) program to run your Visual Warehouse 5.2 VWPLOADR program in the Data Warehouse Center.

This Visual Warehouse 5.2 version of the VWPLOADR program uses the following step and warehouse parameters:

The file that is selected as a source for the step. The step must have only one source file selected. The source file must contain the same number and order of fields as the target tables. Only delimited ASCII ( ASCII DEL) source files are supported. For information about the format of delimited files, see the DB2 Command Reference.
The warehouse target database that was selected when the step was created.

You must have either SYSADM or DBADM authorization to the DB2 database. This program does not support multinode databases. For multinode databases, use Load flat file into DB2 UDB EEE (VWPLDPR) for DB2 UDB Extended Enterprise Edition.

The user ID and password that was defined for the warehouse database in the Warehouse notebook.
The target table that is selected for the step. The step using the program must specify that the maximum number of editions for the target table is 0.
The FileMod string (optional). See the documentation for the LOAD command for the file mode options that can be used with the ASCII DEL format.

Recommendation

Create the target table in its own private DB2 tablespace. Any private tablespace that you create will be used for all new tables that do not specify a tablespace. If processing fails, DB2 might put the whole tablespace in hold status, making the tablespace inaccessible. To avoid this hold problem, create a second private tablespace for steps that do not use the load programs.

To create a tablespace:

CREATE TABLESPACE tablespace-name MANAGED BY SYSTEM USING ('d:/directory')

where directory is the directory that is to contain the databases. DB2 creates the directory.

After you create the tablespace, open the Steps notebook for each step, and specify which tablespace the step uses.

Restrictions

The Data Warehouse Center definition for the agent site that is running the program must include a user ID and password. The DB2 load utility cannot be run by a user named SYSTEM. Be sure to select the same agent site in the warehouse source and the warehouse target for the step that uses the Data Warehouse Center program. The database server does not need to be on the agent site. However, the source file must be on the database server. Specify the fully qualified name of the source files as defined on the DB2 server.

If the program detects a failure during processing, the table is emptied. If the load generates warnings, the program returns as successfully completed.

The DB2 UDB Load Replace program collects database statistics during the load, so you do not need to run the DB2 UDB Runstats (VWPSTATS) program after this program.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

X

X

Log information is located in the following files:

trcppp.log contains log information for the DB2 Load facility.
sqlppp.log contains log information for the DB2 Load facility.
sqlppp.lg2 contains recovery information if an error occurs during load processing.

The location of the log files is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the DB2 UDB Load Replace program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an error in connecting to the database. For more information about the error, see the sqlppp.log file.

16

Load error.
The program detected an error during load processing.
For more information about the error, see the sqlppp.log file.

20

Internal error.
The program detected an internal error.

28

VWS_LOGGING error.
The VWS_LOGGING environment variable is not set.

32

Trace file error.
The trace file could not be opened.

128

Missing DLL.
This is a Windows NT error that can occur when the program was compiled on another operating system.

Definition metadata

Metadata that describes the format of objects in the warehouse, the sources of data, and the transformations that are applied to the data. Examples of definition metadata are column names, table names, and database names.

E

Exporting BLOB data

Some steps you export might contain BLOB (binary large objects) data. The following steps can contain BLOB data:

DB2 UDB Export
DB2 UDB Load
DB2 UDB Reorg
DB2 UDB Runstats
DB2 UDB for OS/390 Load
DB2 UDB for OS/390 Runstats
DB2 for AS/400 Data Load Insert (VWPLOADI)
DB2 for AS/400 Data Load Replace (VWPLOADR)
Data export with ODBC to file (VWPEXPT2)
SQL steps

When you export metadata definitions for BLOB data, multiple tag language files are created. The file name that is generated for each supplementary file has the same name as the tag language file with a numeric extension. For example, if the tag language file you specified is e:\tag\steps.tag, the supplementary tag language files are named e:\tag\steps.1, e:\tag\steps.2, and so on. Only the file extension is used to identify the supplementary files within the base tag language file, so you can move the files to another directory. However, you should not rename the files. You must always keep the files in the same directory, otherwise you will not be able to import the files successfully.

F

G

Generate Key Table Transformer

Use the Generate Key Table transformer to add a unique key to a warehouse table.

The key column must have a data type of Integer. No other data types are valid for the key column, including SMALLINT.

You can change values in the key column in two ways:

Update the values in an existing key column to add values based on the key values in another column. If you want to do this, you must define the key column to which the transformer will write to allow null values. This key column does not need to be a primary key column. This choice will add missing values in the key column starting at the highest value in the key start value column plus the increment value that you specify.
Replace or create key values in a specific column. If values already exist in the key column, they will be overwritten

If you are using a partitioned database, such as the type supported by DB2 UDB Extended Enterprise Edition, the partitioning key column value cannot be updated. If you attempt to do this, you will receive an SQL0270N message with an RC2 in the transformer log table.

You can use the DB2 CREATE TABLE statement to define a partitioning key on a table. If a partitioning key is not defined for a table in a table space that is divided across more than one database partition in a nodegroup, a partitioning key is created by default from the first column of the primary key. If no primary key is specified, the default partitioning key is the first non-long field column that is defined on the table.

Generate Period Table Transformer

Use the Generate Period Table transformer to create a new period table. A period table contains a column that consists of a generated date, time, or timestamp values. In addition to this column, the period table can contain one or more optional output columns based on the date/time value for the row or on parameters that you specify.

You can use a period table in a number of ways. For example:

You can use a period table to analyze trends by creating one or more output columns based on the generated date/time value for each row.
You can use a period table in an SQL join to other data tables such that the resulting product is used for simpler SQL SELECT statements that are based on user requirements.
You can use a period table as a starting point for creating a more complex data table after other data columns are added to the generated period table.

Informix data type conversions

When you import data from an Informix 7 and 9 data source into a new warehouse source, the Data Warehouse Center converts the Informix data types to ODBC data types, as described in the following table:

Informix 7 and 9 data type

ODBC data type

CHAR

CHAR

VARCHAR

VARCHAR

FLOAT

FLOAT

FLOAT

DOUBLE

SMALLFLOAT

REAL

DATE

DATE

INTEGER

INTEGER

SMALLINT

SMALLINT

SERIAL

INTEGER

DECIMAL

DECIMAL

MONEY

DECIMAL

TEXT

LONG VARCHAR

TIMESTAMP

TIMESTAMP

DATETIME YEAR TO FRACTION

N/A

After you have selected a source for your step and opened the Column Mapping page of the step notebook for the first time, you will see the ODBC data types used in the Column Grid. See the DB2 data types table to see characteristics of the ODBC data types that will be used for your business view table.

Interchange file

A metadata interchange format that is supported by the Data Warehouse Center. A tag language file is an example of an interchange file.

Invert Data transformer

Use the Invert Data transformer to invert the rows and columns in a source column, making rows become columns and columns become rows in the target table. The source table and the target table must be in the same warehouse target database.

The order of data among the columns, from top to bottom, is maintained and placed in rows left to right. This transformer rotates the table on an axis cutting a diagonal from the upper left to the lower right. To visualize this process, hold a simple table of data printed on a square piece of paper in your hands at the upper left and lower right. Then flip it over and hold it up to a light. From this exercise, you can see how the rows and columns maintain orientation with respect to one another.

Optionally, you can specify a pivot column in the source table. The data in this column will become the column names in the new step and will not appear as data in the output table.

For example, consider the following example source table and the data in its rows and columns. The data type for each column is in parentheses:

Campbell (DOUBLE)

Cupertino (DOUBLE)

Los Gatos (DOUBLE)

Mill Valley (DOUBLE)

296,539

345,908

357,498

475,676

263,463

318,495

335,843

471,476

309,657

339,572

350,467

428,451

If you use the Invert Data transformer on the example source table and do not specify a pivot column, the transformer will copy the rows and columns from the table, invert them, and write them to a target table. The rows and columns appear in the example target table as follows:

Pivot column (CHAR)

(DOUBLE)

(DOUBLE)

(DOUBLE)

Campbell

296,539

263,463

309,657

Cupertino

345,908

318,495

339,572

Los Gatos

357,498

335,843

350,467

Mill Valley

475,676

471,476

428,451

The source table should either be of the same data type or of data types that are related to each other through automatic promotion (except the pivot column that contains the column names that you plan to use in the target table, if one exists).

The data in the pivot column must be one of the following types:

CHAR

DATE

TIME

TIMESTAMP

SMALLINT

INTEGER

NUMBERIC

DECIMAL

All data in the pivot column must be less than 18 characters. Alphabetic characters will be converted to uppercase on DB2 systems that require object names in uppercase.

The number of rows in the source table should be equal to or less than the maximum number of table columns supported in the version of DB2 that is running this transformer. An error will occur if this number is exceeded.

The Invert Data transformer drops the existing database table and recreates it during each run. Each time you run a step using this transformer, the existing data is replaced, but the table space and table index names are preserved.

Join relationships

The primary foreign key relationships can document the relationships used to build the fact and dimension tables in the warehouse. It is important to document these relationships, to aid in creating the warehouse. Generally the dimension tables should be updated first, allowing for the dimension information to be in the warehouse when the fact tables are updated. The key in the dimension table should be created in the warehouse to insure that the dimension rows are unique. For example the geographies, or regions that a company uses, the time periods that a company uses or the stores or products or customers that a company involves. When records are added to the fact table, the dimension tables are used to insure that the fact row is clean and consistent with the right information. However, you may also want to document additional relationships that exist between tables, and these relationship are not part of a primary/foreign key relationship. These relationships are specific to a particular warehouse schema and do not affect the tables in the schema in any way.

Once the fact and dimension tables are built in the warehouse, they are used to build OLAP cubes for multidimensional analysis. The foreign key relationships are then used to load the values into the cube.

The table definitions and all schema relationships are used to build the OLAP model and can be exported to the OLAP server for the purpose of building the cube.

The warehouse fact and dimension tables can also be used to create reports, summaries and other tables that end users may need for business analysis.

Load flat file into DB2 UDB EEE (AIX only)

Use the Visual Warehouse 5.2 Load flat file into DB2 UDB EEE (AIX only) (VWPLDPR) program to load data from a delimited flat file into a DB2 UDB Extended Enterprise Edition database, replacing existing data.

Before you use this program, you must be familiar with parallel system concepts and parallel load.

The Load flat file into DB2 UDB EEE program performs the following steps in loading data to a parallel database:

Connects to the target database.
Acquires the target partitioning map for the database.
Splits the input file so that each file can be loaded on a node.
Runs a remote load on all nodes.

If the load step fails on any node, the Load flat file into DB2 UDB EEE program performs the following steps:

Builds an empty load data file for each node.
Loads the empty data files.

The Load flat file into DB2 UDB EEE program uses the following step and warehouse source parameters:

The flat file selected as a source for the step. The step must have only one source file selected. Only delimited (DEL) files are supported. The input file and the split files must be on a file system that is shared by all nodes involved in the database load. The shared file system must be mounted on the same directory on all nodes. The directory must be large enough to contain the input file before and after the file is split.
The warehouse database that was selected when the step was created.
The user ID and password that were defined for the warehouse database in the Warehouse notebook.
The target table that is selected for the step.
The column delimiter.
The string delimiter.
The decimal delimiter.
The non-NFS directory.
The output file name.
The partitioned key. If the partitioned key contains UNIX sensitive characters such as '(' and ')', enclose the key in a pair of quotation marks.

The Load flat file into DB2 UDB EEE program does not run the DB2 RUNSTATS utility after the load. If you want to automatically run the RUNSTATS utility after the load, add a step to your process that runs RUNSTATS .

Recommendation

Create the target table in its own private DB2 tablespace. Any private tablespace that you create will be used for all new tables that do not specify a tablespace. If processing fails, DB2 might put the whole tablespace in hold status, making the tablespace inaccessible. To avoid this hold problem, create a second private tablespace for steps that do not use the load programs.

To create a tablespace:

CREATE TABLESPACE tablespace-name MANAGED BY SYSTEM USING ('d:/directory')

where directory is the directory that is to contain the databases. DB2 creates the directory.

After you create the tablespaces, open the Steps notebook for each step, and specify which tablespace the step uses.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

The trcppp.log file contains a list of all the temporary files created by the program. The location of this file is based on the value of the VWS_LOGGING environment variable. To determine the source of a problem, look at this file first.

The following table lists the error codes returned by the Load flat file into DB2 UDB EEE program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program. For more information about the parameters that were used, see the trcppp.log file.

9

Connect error.
The program detected an error in connecting to the database.
For more information about the database name, user ID, and password that were used, see the sqlppp.log file.

10

Missing system programs.
Verify that the path is correct by checking its value in the log file.

11

Missing SQL utilities
Verify that the path is correct by checking its value in the log file.

12

Missing db2nodes.cfg file.
Verify that the SQLLIB_DIR global variable is correctly defined.

20

DB2DGPMAP utility error.
The program detected an error when DB2DGPMAP read the target partitioning map for the database.

30

DB2SPLIT utility error.
The program detected an error during DB2SPLIT processing.

31

Missing output from DB2SPLIT.
The files generated by DB2SPLIT cannot be found.

40

Load utility error.
The program detected an error during load utility processing.

Microsoft SQL Server data type conversions

When you import data from a Microsoft SQL Server 6 and 7 data source into a new warehouse source, the Data Warehouse Center converts the Microsoft data types to ODBC data types, as described in the following table:

Microsoft SQL Server 6 and 7 data type

ODBC data type

CHAR

CHAR

VARCHAR

VARCHAR

DECIMAL

DECIMAL

FLOAT

FLOAT

DOUBLE

DOUBLE

NUMERIC

NUMERIC

DATETIME

TIMESTAMP

SMALLDATETIME

TIMESTAMP

INT

INTEGER

TINYINT

TINYINT

SMALLINT

SMALLINT

MONEY

DECIMAL

SMALLMONEY

DECIMAL

SYSNAME

VARCHAR

TEXT

LONG VARCHAR

BINARY

BINARY

BIT

BIT

IMAGE

LONG VARCHAR

REAL

REAL

TIMESTAMP

VARCHAR

VARBINARY

VARCHAR (With Is Text field not checked)

Microsoft SQL Server 7 data type

ODBC data type

NCHAR

CHAR for bit data

NVARCHAR

VARCHAR for bit data

NTEXT

LONG VARCHAR for bit data

Moving average transformer

The Moving average transformer predicts the future course of time series values.

Moving averages, both simple and exponentially smoothed, often have the ability to predict the future course of a time-related series of values. Moving averages are widely used in time-series analysis in business and financial forecasting. Rolling sums have other widely used financial uses.

Moving averages redistribute events that occur briefly over a wider period of time. This redistribution serves to remove noise, random occurrences, and large peaks or valleys from time-series data. The moving average method can be applied to a time-series data set to remove the effects of seasonal variations, extract the data trend, enhance the long-term cycles, and smooth a data set before performing higher level analysis.

With this transformer, you can calculate:

A simple moving average. A simple moving average (SMA) is defined as:

SMA [row i] = SUM (data values for last N rows) / N

An exponential moving average. An exponential moving average (EMA) is defined as:

EMA [row i] = ( Value [row i] * K ) + ( EMA [row i-1] * ( 1-K ) )

where: K = 2/(N+1)

For example, for a four-day EMA, N = 4, K = 2/5 and 1-K = 3/5

An exponential moving average is also known as an exponentially smoothed moving average.

A rolling sum for N periods of data, where N is specified by the user. A rolling sum (RS) is defined as:

RS [row i] = SUM ( data values for the last N rows)

N, O

ODBC connect string syntax

The Data Warehouse Center automated ODBC connect string follows the syntax shown in the following examples. The automated ODBC connect string works for Intersolv level 9 drivers.

If you do not have connectivity for the database types that the Data Warehouse Center supports, you can code your own ODBC connect string. The ODBC connect string syntax for Oracle, Informix, Sybase, Microsoft SQL Server and generic ODBC is shown in the following examples:

Oracle 7 and 8 connect string syntax

DSN=dsn;UID=uid;PWD=pwd;SRVR=srvr:computername:sid;

Where:

dsn is the data source name.

uid is the user ID.

pwd is the password.

srvr is the network. A network is a letter that identifies the network protocol being used. The network can be either P (named pipes), X (SPX), B (NETBIOS), T (TCP/IP),

D (DECNet), A (Oracle Async), or AT (AppleTalk).

computername is the name of the Oracle listener on the network.

sid is the Oracle system identifier and refers to the instance of Oracle that is running on the host.

Sybase 10/11 connect string syntax

The following is the path name to the interfaces file:

DSN=dsn;UID=uid;PWD=pwd;SRVR=srvr;DB=db;IFILE=interfacefile;

Where:

dsn is the data source name.

uid is the user ID.

pwd is the password.

srvr is the name of the server that contains the tables that you want to access. If you do not specify a server name, the default is the server name in the DSQUERY environment variable.

db is the name of the database that you want to connect to. The default is the database that is defined by the Sybase system administrator for each user.

Informix 7, 9 connect string syntax

DSN=dsn;UID=uid;PWD=pwd;DB=db;HOST=host;SERV=serv;PRO=pro;

Where:

dsn is the data source name.

uid is the user ID.

pwd is the password.

db is the name of the database that you want to connect to.

host is the name of the workstation where your Informix 5 server resides.

serv is the name of the service as it appears on the host workstation.

pro is the protocol that is used to communicate with the server. Some of the possible values are FTP, IPX, LANMAN, and TCP/IP.

Microsoft SQL Server 6 and 7 connect string syntax

DSN=dsn;UID=uid;PWD=pwd;SRVR=srvr;DB=db;WKID=wkid;

Where:

dsn is the data source name.

uid is the user ID.

pwd is the password.

srvr is the name of the server that contains the SQL Server tables that you want to access.

db is the name of the database that you want to connect to.

wkid is the workstation ID that is used by your agent site.

Generic ODBC data source connect string syntax

If you have a generic ODBC source that you want to use as an information resource other than those listed in the Resource Type window (for example, Microsoft Access), you can use the connect string shown here.

DSN=dsn;UID=uid;PWD=pwd;

Where:

dsn is the data source name.

uid is the user ID.

pwd is the password.

OLAP Integration Server

The OLAP Integration Server enables you to design schemas. The first main step in the process of creating an OLAP cube is to create an OLAP model. This can be done within the OLAP Integration Server or by creating a warehouse schema in the Data Warehouse Center and then exporting it to the OLAP Integration Server.

OLAP Server: Calc with calc rules (ESSCALC2)

Use the OLAP Server: Calc with calc rules (ESSCALC2) program to apply the specified Calc script to an OLAP Server database. When you define a step that uses this program, do not specify a source or a target table for the step. The OLAP Server can be a DB2 OLAP Server or Essbase server.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

Log information for the OLAP Server: Calc with calc rules program is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the OLAP Server: Calc with calc rules program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an invalid application name, database name, user ID, or password.

16

OLAP Server error.
An OLAP Server API other than the CalcFile has returned an error.

20

Calc error.
The CalcFile Server API returned an error.

24

Internal error.

The program detected a trace error, such as the inability to create or open the trace file.

OLAP Server: Default calc (ESSCALC1)

Use the OLAP Server: Default calc (ESSCALC1) program to call the default OLAP Server calc script associated with the target database. The OLAP Server can be a DB2 OLAP Server or Essbase server.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

The file trcppp.log contains log information for the OLAP Server: Default calc program. The location of this file is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the OLAP Server: Default calc program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an invalid application name, database name, user ID, or password.

16

OLAP Server error.
An OLAP Server API other than the default CalcScript has returned an error.

20

Calc error.
The CalcFile Server API returned an error.

32

Open Trace error

The program detected a trace error, such as the inability to create or open the trace file.

OLAP Server: Free text data load (ESSDATA1)

Use the OLAP Server: Free text data load (ESSDATA1) program to load data from a comma-delimited flat file into a multidimensional OLAP Server database using free-form data loading. The OLAP Server can be a DB2 OLAP Server or Essbase server.

The OLAP Server: Free text data load program uses the file that is selected as a source for the step in the process. The step must have only one source file selected. The data in the source file must match the OLAP Server outline values exactly. The selected source file must be on the agent site.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

Log information for this program is stored in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the OLAP Server: Free text data load program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an invalid application name, database name, user ID, or password.

16

OLAP Server error.
An OLAP Server API other than the load has returned an error.

20

Load error.
The Report or Load OLAP Server APIs returned an error.

32

Open Trace error

The program detected a trace error, such as the inability to create or open the trace file.

OLAP Server: Load data from file with load rules (ESSDATA2)

Use the OLAP Server: Load data from file with load rules (ESSDATA2) program to load data from a source flat file into a multidimensional OLAP Server database using load rules. The OLAP Server can be a DB2 OLAP Server or Essbase server.

The program uses the selected file as a source for the step. The step must have only one source file selected. The source file name must follow the conventions for specifying file names on the client or server of your OLAP Server system.

If the file location flag in your program indicates that the file is on an agent site, you must specify a fully qualified file name in the Load rule file name field on the Parameters page of the Step notebook.

If the file location flag indicates that the source file is on the OLAP server, you must specify the file name without an extension in the Load rule file name field on the Parameters page of the Step notebook.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

Log information for the OLAP Server: Load data from file with load rules program is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the OLAP Server: Load data from file with load rules program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an invalid application name, database name, user ID, or password.

16

OLAP Server error.
An OLAP Server API, rather than the import, has returned an error.

20

Load error.
The Import OLAP Server APIs have returned an error.

32

Open Trace error

The program detected a trace error, such as the inability to create or open the trace file.

OLAP Server: Load data from SQL table with load rules (ESSDATA3)

Use the OLAP Server: Load data from an SQL table with load rules (ESSDATA3) program to load data from an SQL source into a multidimensional OLAP Server database using load rules. The OLAP Server can be a DB2 OLAP Server or Essbase server.

The OLAP Server: Load data from an SQL table with load rules program uses the user ID and password that are defined for the Warehouse source.

Do not select a target table for the step.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

Log information for the OLAP Server: Load data from an SQL table with load rules program is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the OLAP Server: Load data from an SQL table with load rules program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an invalid application name, database name, user ID, or password.

16

OLAP Server error.
An OLAP Server API, rather than the import, has returned an error.

20

Load error.
The Import OLAP Server APIs returned an error.

32

Open Trace error

The program detected a trace error, such as the inability to create or open the trace file.

OLAP Server: Load data from a file without using load rules (ESSDATA4)

Use the OLAP Server: Load data from a file without using load rules (ESSDATA4) program to load data from a flat file into a multidimensional OLAP Server database without using load rules. The OLAP Server can be a DB2 OLAP Server or Essbase server.

When you define a step that uses this program, select a source file, but do not specify a target table. The source file name must follow the conventions for specifying file names on the client or server of your OLAP Server system.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

The following table lists the error codes returned by the OLAP Server: Load data from file without using load rules program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an invalid application name, database name, user ID, or password.

16

OLAP Server error.
An OLAP Server API, rather than the import, has returned an error.

20

Load error.
The Import OLAP Server APIs returned an error.

32

Internal error.

The program detected a trace error, such as the inability to create or open the trace file.

OLAP Server: Update outline from file (ESSOTL1)

Use the OLAP Server: Update outline (ESSOTL1) program to update an OLAP Server outline from a source flat file using load rules. The OLAP Server can be a DB2 OLAP Server or Essbase server.

The OLAP Server: Update outline program uses the selected file as a source for the step. The step must have only one source file selected. The source file name must follow the conventions for your OLAP Server for specifying file names on the client or server.

If the file location flag in your program indicates that the file is on an agent site, you must specify a fully qualified file name in the Load rules file name field on the Parameters page of the Step notebook.

If the file location flag indicates that the source file is on the OLAP Server, you must specify the file name without an extension in the Load rules file name field on the Parameters page of the Step notebook.

For more information, see the documentation for your OLAP Server.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

Log information for the OLAP Server: Update outline program is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the OLAP Server: Update outline program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an invalid application name, database name, user ID, or password.

16

OLAP Server error.
An OLAP Server API other than the load has returned an error.

20

Load error.
The BuildDimension OLAP Server API returned an error.

32

Open Trace error

The program detected a trace error, such as the inability to create or open the trace file.

OLAP Server: Update outline from SQL table (ESSOTL2)

Use the OLAP Server: Update outline from SQL table (ESSOTL2) program to update an ODBC Server outline from an SQL source using load rules. The OLAP Server can be a DB2 OLAP Server or Essbase server.

The OLAP Server: Update outline from SQL table program uses the user ID and password defined for the Warehouse source database that is connected to the step that uses the program.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

Log information for the OLAP Server: Update outline from SQL table program is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the OLAP Server: Update outline from SQL table program.

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

Connect error.
The program detected an invalid application name, database name, user ID, or password.

16

Essbase error.
An OLAP Server API other than the load has returned an error.

20

Load error.
The BuildDimension OLAP Server APIs returned an error.

32

Open Trace error

The program detected a trace error, such as the inability to create or open the trace file.

One-way ANOVA transformer

A one-way ANOVA uses one grouping column. The data assumptions for the one-way ANOVA are:

There is one data column and one grouping column.
The data column is composed of data values (x1, x2, x3..., xn).
The data column can contain equal sample sizes in each group. Unequal sample sizes are allowed for the one-way ANOVA.

The following table is a sample of a one-way ANOVA summary table:

SOURCE

DF

SS

MS

FVALUE

PVALUE

Between
groups

DFB

SSB

MSB

Fvalue

Pvalue

Within
groups

DFE

SSE

MSE

Null

Null

Total

DFT

SST

Null

Null

Null

Assume the values for the following variables:

n = The total number of data values in the data column.
k = The number of unique groups in the single grouping column.
x(i) = The ith data value in the data column.
count(i) = Count of the data values in the ith group
avgGrp(i) = The average of all the values in the ith group
avgTotal = The average of all the data values in the data column.

The equations for the calculated values are:

DFB = k - 1
DFE = n - k
DFT = n - 1
SSB = SUM(i=1 to k) [ count(i) * (avgGrp(i) - avgTotal)**2 ]
SSE = SST - SSB
SST = SUM(i=1 to n) [ (x(i) - avgTotal)**2 ]
MSB = SSB / DFB
MSE = SSE / DFE
FValue = MSB / MSE
PValue = PValueFunction( DFB, DFE, FValue)

Oracle data type conversions

When you import data from an Oracle Version 7 and 8 data source into a new warehouse source, the Data Warehouse Center converts the Oracle data types to ODBC data types, as described in the following table:

Oracle 7 and 8 data type

ODBC data type

CHAR

CHAR

VARCHAR2

VARCHAR

NUMBER

DECIMAL

DATE

TIMESTAMP

LONG

LONG VARCHAR

LONG RAW

LONG VARCHAR

RAW

VARCHAR

NUMBER

DOUBLE

RAW and LONG data types cannot be over 2,147,478,528 bytes.

P-value

P-value is the probability, when the null hypothesis is true, that the absolute value of the T-statistic would equal or exceed the observed value (T-value). A small P-value is evidence that the null hypothesis is false and that the attributes are in fact correlated.

palette

The toolbar at the left side of the Process Model window. Click the icons in the toolbar to add sources, targets, and steps to the process, and to link objects in the process.

Passwords in the Data Warehouse Center

The Data Warehouse Center uses several different types of passwords. These passwords are defined in different ways. For example, you define a password that the Data Warehouse Center needs to access upon startup before you initialize or reinitialize the Data Warehouse Center server. You also define passwords that the Data Warehouse Center needs to access after startup within the Data Warehouse Center environment.

These passwords can be grouped into the following categories:

Passwords that are used by Data Warehouse Center users to log on to the Data Warehouse Center
Passwords that are used by an agent to run programs
Passwords that are used to connect to a database to access data
The password that the Data Warehouse Center server uses to connect to the warehouse control database

Different actions are required to change each of these passwords.

Passwords that are used by Data Warehouse Center users

Data Warehouse Center users use a password to log on to the Data Warehouse Center. A warehouse user with Administration privilege assigns passwords to users as part of the task of defining users to the Data Warehouse Center.

Current Data Warehouse Center users who do not have administration privilege can change their passwords by opening their own Warehouse User properties notebook and making the appropriate changes.

A current warehouse user who has administration privilege can change the password for another Data Warehouse Center user. To change the password for another Data Warehouse Center user, open the Users notebook and make the appropriate changes.

While users are not required to change their passwords once they have been assigned, they may want to change their Data Warehouse Center password if their Windows NT password changes in order to keep the passwords coordinated.

Passwords that are used by an agent

A Data Warehouse Center agent site can run as a system process or a user process. An agent site that runs steps as user processes must be defined with a user ID and a password that is defined on the computer on which it is installed. To define a user ID and password for an agent site to the Data Warehouse Center, open the Define Agent Site notebook and enter the information into the fields in the Operating system logon area.

If you want to change a password for an existing agent site, ensure that you change the password on the computer on which the agent site resides first. Then right-click the agent site in the Data Warehouse Center contents pane and click Change User ID and Password (or open the properties notebook for the agent site).

On Windows NT workstations, the agent can be run as either a system or a user process, depending upon how the Data Warehouse Center agent daemon service is set up. The default agent runs as part of the Data Warehouse Center server, so by default the agent's characteristics depend upon how the Data Warehouse Center Server service is set up.

The step inherits this characteristic from the agent. If the step will need something that a user would have access to (such as a LAN drive), the step must run as a user process.

You can change how the Data Warehouse Center server runs by opening the Services Control Panel in Windows NT, selecting the Data Warehouse Center server, and clicking Startup. The Service Startup window opens. Under Log On As, select System Account to specify that the service will log on to the system account. To specify that the service will log on to a user account, select This Account and type the user account. When you enter a user ID and password in the properties notebook for the default agent, the agent site will run a user process.

For AIX and Sun, a user ID and password are always required for the agent, and the agent runs on behalf of (and inherits the characteristics of) that user.

One common problem is caused by running Data Warehouse Center agents as a system process rather than a user process. When the agent runs as a system process, it is not authorized to connect to network drives or products because the process does not have a user ID. Symptoms of this problem include the agent being unable to find the Data Warehouse Center program (Error RC2 = 128 or Error RC2 = 1 in the log viewer Details window) or being unable to initialize the program.

Passwords that are used to connect to a database or system

When you define a warehouse target that represents a relational database, you must specify the database administrator's user ID and password. Access to warehouse sources is read-only, so the user ID and password that you use do not need to have database administrator authority. You define user IDs and passwords for warehouse sources and targets on the Database page of the Warehouse Source and Warehouse Target notebooks.

To change a password for a warehouse source or target, you must belong to a group that has access to the source or target. Ensure that you change the password on the relational database first, then right-click the warehouse source or warehouse target in the Data Warehouse Center contents pane and click Change User ID and Password. If you want to change user IDs and passwords for several warehouse objects at once, you can select multiple objects in the contents pane and then click Change User ID and Password. Make the appropriate changes in the Change User ID and Password window. If you change the user ID of a database administrator, ensure that the new user ID has the same level of authority as the previous user ID.

Because database administrator user IDs and passwords can change often, you might need to update these more often than other Data Warehouse Center user IDs or passwords.

The password that the Data Warehouse Center server uses to connect to the control database

Before you define this password to a user ID in the Data Warehouse Center, you must define it to the same user ID in Windows NT. This user ID must have Windows NT administrator authority.

You define this password in the Data Warehouse Center when you install the warehouse server. The installation program saves the Data Warehouse Center control database administrator user ID and password in the Windows NT registry and to the warehouse control database as the default user ID and password.

While the Data Warehouse Center will save the user ID and password to the Windows NT registry each time it is installed, it will only save the user ID and password as the default user ID and password only the first time it is installed.

When the user ID and password are saved, they are managed separately. You can change the default user ID and password at any time, as you would a user ID and password for any other Data Warehouse Center user.

If you want to change the user ID or password for the warehouse control database administrator, you must first change the corresponding user ID or password on Windows NT. Then, open the Warehouse Control Database Management window.

If the Data Warehouse Center administrative client is installed, click Start --> Programs --> DB2 for Windows NT --> Data Warehouse Center --> Control Database Management.
If the Data Warehouse Center administrative client is not installed, click Start --> Programs --> DB2 for Windows NT --> Warehouse Control Database Management.

The Data Warehouse Center does not require you to change this password after it is created. You might need to change it, for example, if your security processes require a change.

Pivot Data transformer

Use the Pivot Data transformer to group related data from selected columns in a source table into a single column, called a pivot column, in a target table. You can also carry over columns from your source table to your target table without pivoting them.

The pivot column spans the same number of rows as there are selected columns. You can create more than one pivot column.

After you select your pivot columns, you specify the number of pivot groups that you want to use. The number of groups that you type must be a divisor of the number of pivot columns that you selected. For example, if you selected 6 pivot columns, you can type the number 1, 2, 3, or 6. If you selected 5 pivot columns, you can type 1 or 5 in this field.

You must also create a data group for the data in each selected pivot column in the source table. The data groups are used in the output table to describe or label information that is collected into a single column so that the original meaning of the data as expressed by the column labels in the input table is preserved in the output table.

Suppose a table has columns A, B, C, D, E, F, G and H. You decide that you want to carry over columns A and B and pivot columns C, D, E, F, G, and H. You also decide that you want to create 2 pivot groups. The total number of pivot columns divided by the number of pivot groups determines how many pivot columns will be in each pivot group. In this case, there are six pivot columns total and two pivot groups. This means that each pivot group will have three pivot columns.

You use the Advanced window for the Pivot Data transformer to create data groups and define your pivot column groups. For example:

Data Group

Pivot Group 1

Pivot Group 2

Orange

C

D

Purple

E

F

Red

G

H

The target table of a Pivot Data transformer is organized in cycles. A cycle contains rows of pivot data that is transformed from each row in your source. For example, suppose your source table has row 1. The intersection of this row with your source columns is represented as A1, B1, C1, D1, etc. The first cycle in your target table will look like this:

A (this is a carry-over column)

B (this is a carry-over column)

Data Group

Pivot Group 1

Pivot Group 2

A1

B1

Orange

C1

D1

A1

B1

Purple

E1

F1

A1

B1

Red

G1

H1

The number of rows in a cycle is equal to the number of rows in the table in the Advanced window.

The number of cycles in a table is equal to the number of rows in your source table. For example, if your source table has two rows, your target table will look like this:

A

B

Data Group

Pivot Group 1

Pivot Group 2

A1

B1

Orange

C1

D1

A1

B1

Purple

E1

F1

A1

B1

Red

G1

H1

A2

B2

Orange

C2

D2

A2

B2

Purple

E2

F2

A2

B2

Red

G2

H2

If you don't select any pivot columns when you define your Pivot Data transformer, the transformer decides which columns to pivot and what data groups to create.

Columnar data in each pivot group must have either the same data type or data types that are related to each other through automatic promotion.

Planning to import objects

When you import a tag language file, the metadata that it contains is stored in the warehouse control database. The Data Warehouse Center will accept a tag language file that is not generated by another Data Warehouse Center system. For example, you can import a tag language file that is generated by an ETI product. Any object with an unknown OBJTYPE in a non-Data Warehouse Center tag language file will be ignored, and processing will continue. If an object is of a known OBJTYPE, but does not contain all the properties required by the Data Warehouse Center, an error will occur.

Importing from one Data Warehouse Center system to another

After you complete the export process, and a tag language file is created, the file is ready to import to a target system.

Before you the import the file, consider the following issues:

Naming considerations

If you are using the import utility to move a warehouse source from a test environment to a production environment, make sure that the production environment does not already have a warehouse source with the same warehouse source name unless you want to over-write the definition of the warehouse source.

If you import a step into a system that contains a step with the same name, then you must either delete the step that you want to overwrite, or change the step to development mode. Otherwise, the step cannot be updated and an error will occur.

Ordering the import of objects

You use a logical order to import objects. An object that is referred to but not defined in a tag language file must be defined in the destination warehouse control database. You can do this by first importing the object definition that is referred to in a separate tag language file, or you can define the object in the Data Warehouse Center tree view.

For example, you can import warehouse source definitions first. When the warehouse source definitions are ready for use, you can import subject areas, and warehouse target definitions, and any corresponding steps individually.

Requirement: Do not use the import function to migrate from a previous version of the Data Warehouse Center. You must always import and export tag language files using the same version of the Data Warehouse Center. For information about migrating from a previous version of the Data Warehouse Center, see DB2 UDB Quick Beginnings.

Post-import considerations

After a successful import, you receive a message with a return code of 0, and the Import Metadata window closes.

An imported subject area, warehouse source, or warehouse target will not be displayed in the tree view unless you belong to a warehouse group that has access to the object. A warehouse group that is associated with exported warehouse sources or warehouse targets must be related to a process that uses those objects; some warehouse groups might not be exported. If the warehouse groups that you want your warehouse sources and warehouse targets to be related to are not imported, you need to add the warehouse groups to the Data Warehouse Center system. Metadata for individual users is not exported. You must add users to the Data Warehouse Center.

The Data Warehouse Center assigns development mode to all imported steps so that you can verify that the steps are complete and accurate before you promote them to test or production mode.

Before you use imported objects:

Refresh the tree view. Update database information for target databases (database name, user ID, and password), if necessary. If any new agent site definitions were imported, their passwords also need to be updated. If you imported any programs that have parameters whose values are used as passwords, you must reset those values.

Add your user ID to a security group that has access to the imported subjects areas and warehouse targets.

Before you run imported steps:

Ensure that all the warehouse source and warehouse target databases that are referred to by the imported steps are defined and configured in your Windows system. Verify that all the definitions that you imported and change any parameters to match the new environment.

After you import a tag language file that includes steps that include joins, you might notice the tables are not placed as you expected in the Process Modeler or Warehouse Schema Modeler. Some information cannot be imported from one system to another. You will need to reposition your steps and star schemas using the Process or Warehouse Schema Modeler.
Promote steps as necessary.

Importing tag language files again

If you import an existing step again, it must be in development mode, or an error will occur. You must demote steps to development mode before importing them again.

Most tag language files can be imported more than once without errors. If a tag language file includes primary and foreign keys, the keys must be deleted before the file can be imported again.

If the names or order of columns in a table or file have changed (for example, a warehouse source or warehouse target table is redefined), it is a good idea to delete the warehouse source table, file, or step definition (including the target table in the warehouse target) from the target system before importing the same table or step again. At a minimum, you should verify that the column or field information is imported as you expected.

Q, R

Replication

Replication is a process of maintaining a defined set of data in more than one location. It involves copying designated changes from one location (a source) to another (a target), and synchronizing the data in both locations. The source and target can be in logical servers (such as a DB2 database or a DB2 for OS/390 subsystem or data-sharing group) that are on the same machine or on different machines in a distributed network.

You can use the Data Warehouse Center to define a replication step, which will replicate changes between any DB2 relational databases. You can also use other IBM products (such as DB2 DataJoiner and DataPropagator NonRelational) or non-IBM products (such as Microsoft SQL Server and Sybase SQL Server) to replicate data between many database products--both relational and nonrelational.

The replication environment that you need depends on when you want data updated and how you want transactions handled. You might use a log-based or a trigger-based change-capture mechanism, or both. You also have the flexibility to choose the locations of the replication components to maximize the efficiency of your replication environment.

REORG Visual Warehouse 5.2 UDB DB2 program

This step is intended for Data Warehouse Center users who previously used Visual Warehouse Version 5.2.

Use this step to run the Visual Warehouse 5.2 DB2 UDB REORG (VWPREORG) program in the Data Warehouse Center. This step runs the DB2 UDB REORG utility on a target table.

You schedule this step to run on the target table of a process, after the process completes.

The Visual Warehouse 5.2 DB2 UDB REORG program extracts the following step and warehouse source parameter values from the Process Modeler and your step definition:

The warehouse target database name
The user ID and password for the warehouse target
The target table that is defined for the step

These parameters are predefined. You do not specify values for these parameters.

The location of the log files for this program is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the REORG program.

Rules table (Clean Data transformer)

Before you can use the Clean Data transformer, you must create a table that contains rules for your clean type. A rules table designates the values that the Clean Data transformer will use during the find and replace process.

The rules table must be in the same database as the source and target tables.

At a minimum, a rules table must contain at least two columns. One column contains find values. The other column contains replace values. The rows in each column correspond to each other.

For example, Column 1 and Column 2 in a rules table have the values shown here:

Column 1

Column 2

Desk

Chair

Table

Lamp

Suppose that Column 1 contains the find values, and Column 2 contains the replace values. When you run the step, the Clean Data transformer searches your source column for the value Desk. Wherever it finds the value Desk, it writes the value Chair in the corresponding field of the target column.

The Clean Data transformer copies values that are not listed in the find column directly to the target table. In the example, the value Stool is not listed in the column that contains the find values. If the selected source column contains the value Stool, the Clean transformer will write Stool to the corresponding field in the target column.

For more information on rules tables, see the Data Warehouse Center Administration Guide.

The following table describes the columns that must be included in the rules table for each clean type:

Clean type

Minimum number
of columns in
rules table

Purpose of columns

Find and replace

2 - 3

The first column contains find values.
The second column contains replace values.
The third column, if specified, contains rule-ordering information and is not required.

Discretize

3 - 4

The first column contains the low end of the find value range.
The second column contains the high end of the find value range.
The third column contains the replacement value.
The forth column, if specified, contains rule-ordering information and is not required.

Clip

3 - 5

The first column contains the low end of the find value range.
The second column contains the high end of the find value range.
The third column contains the low end of the replacement value.
The fourth column contains the high end of the replacement value.
The fifth column, if specified, contains rule-ordering information and is not required.
Tip: When you use the Clip clean type, you can run a find and replace procedure that searches outside of value ranges.

Carry over

None allowed

Rule table not used.

You can reorder the output columns using the Step notebook. You can change column names on the Column Mapping page of the Step notebook.

Run FTP command file program (VWPFTP)

Use the Run FTP command file program (VWPFTP) program to run an FTP command file.

The Run FTP command file program can detect only major errors of the FTP command. Confirm the results of the transfer before you use the transferred data.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

X

X

Log information for program processing is contained in the trcppp.log file. The location of this file is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by this program:

Value

Description

4

Warning.
The program ran but might have encountered a problem.

8

Parameter error.
Too many or too few parameters were supplied to the program.

12

FTP error.
The program detected an FTP error.

16

Internal error.
The program detected an internal error.

32

Open Trace error

The program detected a trace error, such as the inability to create or open the trace file.

RUNSTATS Visual Warehouse 5.2 DB2 UDB program

This step is intended for Data Warehouse Center users who previously used Visual Warehouse Version 5.2.

Use this step to run the Visual Warehouse 5.2 DB2 UDB RUNSTATS (VWPSTATS) program in the Data Warehouse Center. This step runs the DB2 UDB RUNSTATS utility on a target table.

You schedule this step to run on the target table of a process, after the process completes.

The Visual Warehouse 5.2 DB2 UDB RUNSTATS program extracts the following step and warehouse source parameter values from the Process Modeler and your step definition:

The warehouse target database name
The user ID and password for the warehouse target
The target table that is defined for the step

The location of the log files for this program is based on the value of the VWS_LOGGING environment variable.

These parameters are predefined. You do not specify values for these parameters.

The following table lists the error codes returned by the RUNSTATS program.

Show Related

Use the Show Related notebook to show the relationships between steps, sources, targets, and groups in a warehouse.

Only first level dependencies are shown in the table. For example, if you select a step as the target object and the step has a dependency on a source that has a dependency on another source, only the dependency on the step is shown. To see the dependency on the source, click mouse button 2 on that source object and select Show Related from the pop-up menu.

If you want to compare the relationships between several target objects and their related objects, you can open several Show Related notebooks from the Data Warehouse Center object tree.

There are several reasons to show related objects:

You want to know the structure of a process.
You want to drop or recreate an object and need to know all the dependency relationships on the object to be dropped or created.

Dropping an object that has a dependency might present problems. For example, if you drop a source table with dependent steps, those steps will be marked inoperative.

Subject Area

You use a subject area to logically group the processes (and the steps, warehouse sources, and warehouse targets within the processes) that are related to a particular topic or function.

For example, if you have a series of processes that move and transform sales data, you create a Sales subject area, and create the processes within the subject area. Similarly, you group Marketing processes under a Marketing subject area. You use the subject areas to select the processes that are of interest to you.

Submit MVS JCL Jobstream (VWPMVS)

Use the Submit JCL Jobstream on MVS (VWPMVS) program to submit a JCL jobstream that resides on MVS to an MVS system for execution. The name of the MVS job must be the name of the MVS user ID followed by one alphanumeric character. This is a restriction of FTP/MVS.

Before you use the Submit JCL Jobstream on MVS program, test your JCL file by running it from TSO under the same user ID that you plan to use with the program.

This program runs successfully if the MVS host name, user ID, and password are correct. If you want to test the validity of the results generated by the JCL, you must write your own testing logic. If the FTP session times out, this program returns an FTP error, even if the JCL ultimately runs successfully on the MVS system.

The Submit JCL Jobstream on MVS program also receives the JES log file on the agent site. It erases the copy of the JES log from any previous jobs on the agent site before submitting a new job for processing. It also verifies that the JES log file is downloaded to the agent site after the job completes.

The Submit JCL Jobstream on MVS program requires TCP/IP 3.2 or later installed on MVS. Verify that the FTP service is enabled before using the program.

Supported operating systems

Windows NT

AIX

Sun Solaris

AS/400

OS/2

X

X

X

X

Log information for program processing is contained in the trcppp.log file.

Log information for the FTP command result is contained in the ftpppp.log file.

The location of the log files is based on the value of the VWS_LOGGING environment variable.

The following table lists the error codes returned by the Submit JCL Jobstream on MVS program.

Value

Description

8

Parameter error.

Too many or too few parameters were supplied to the program.

12

FTP error.

The Data Warehouse Center program detected an FTP error, such as the inability to create the JES log file.
For more information about the FTP error, see the ftpppp.log file.

16

Internal error.

The program detected an internal error, such as the inability to erase a previously existing JES log file.

32

Open Trace error

The program detected a trace error, such as the inability to create or open the trace file.

128

Missing DLL.

A Windows NT error that can occur when the program was compiled on another operating system.

Sybase data type conversions

When you import data from a Sybase Version 10/11 data source into a new information resource, the Data Warehouse Center converts the Sybase data to ODBC data types, as described in the following table:

Sysbase 10/11 data type

ODBC data type

CHAR

CHAR

INT

INTEGER

TINYINT

TINYINT

SMALLINT

SMALLINT

FLOAT

FLOAT

DOUBLE

DOUBLE

DECIMAL

DECIMAL

DATETIME

TIMESTAMP

SMALLDATETIME

TIMESTAMP

MONEY

DECIMAL

SMALLMONEY

DECIMAL

VARCHAR

VARCHAR

SYSNAME

VARCHAR

NUMERIC

NUMERIC

TEXT

LONG VARCHAR

BINARY

CHAR (With Is Text field not checked)

BIT

BIT

IMAGE

LONGVARCHAR

REAL

REAL

TIMESTAMP

VARCHAR

VARBINARY

VARCHAR (With Is Text field not checked)

T-value

T-value is the observed value of the T-statistic used to test the hypothesis that two attributes are correlated. The T-value can range between -infinity and +infinity. A T-value near 0 is evidence for the null hypothesis that there is no correlation between the attributes. A T-value far from 0 (either positive or negative) is evidence for the alternative hypothesis that there is correlation between the attributes.

The T-statistic is defined as:

T = r * SQRT((n-2) / (1 - r*r))

where r is the correlation coefficient, n is the number of input value pairs, and SQRT is the square root function.

If the correlation coefficient, r, is either -1 or +1, the T-value will be represented by NULL. If the T-value is too small or too large to be represented by a number, the value will be represented by NULL.

tag language

A format for defining object types and objects, and actions to be taken on those object types and objects, in the Data Warehouse Center or the information catalog.

tag language file

A file that contains tag language that describes objects and object types to be added, updated or deleted in the Data Warehouse Center or the information catalog when the file is imported. A tag language file is produced by exporting objects from the Data Warehouse Center or from the Information Catalog Manager.

In the Information Catalog Manager, you can also produce a tag language file by:

Transferring a delete history log.
Extracting descriptive data from another database system using an extract program.

Three-way ANOVA

A three-way ANOVA uses three grouping columns. The data assumptions for the three-way ANOVA are as follows:

There is one data column and three grouping columns, Column A, Column B, and Column C.
The data column is composed of data values (x1, x2, ..., xn).
The data column must contain equal sample sizes in each of the subgroups formed by the unique value triplets in the three grouping columns.
This sample size must be greater than 1. Unequal sample sizes are not allowed for the three-way ANOVA.

The following table is a sample of a three-way ANOVA summary table:

SOURCE

DF

SS

MS

FVALUE

PVALUE

Column A

DFA

SSA

MSA

FvalueA

PValueA

Column B

DFB

SSB

MSB

FvalueB

PValueB

Column C

DFC

SSC

MSC

FvalueC

PValueC

(Column A)x(Column B)

DFAB

SSAB

MSAB

FvalueAB

PValueAB

(Column A)x(Column C)

DFAC

SSAC

MSAC

FvalueAC

PValueAC

(Column B)x(Column C)

DFBC

SSBC

MSBC

FvalueBC

PValueBC

(Column A)x(Column B)x(Column C)

DFABC

SSABC

MSABC

FvalueABC

PValueABC

Within Groups

DFE

SSE

MSE

Null

Null

Total

DFT

SST

Null

Null

Null

Assume the values for the following variables:

n = The total number of data values in the data column.
a = The number of unique groups in Column A.
b = The number of unique groups in Column B.
c = The number of unique groups in Column C.
k = The number of unique subgroups (cells) forms by the unique value triplets in the three grouping columns. Note: k = (a*b*c)
m = The number of data values in each subgroup (cell) for all three grouping columns. m must be greater than 1.
x(i) = The ith data item in the data column.
avgGrpA(i) = The average of all the data values in the ith group of Column A.
avgGrpB(i) = The average of all the data values in the ith group of Column B.
avgGrpC(i) = The average of all the data values in the ith group of Column C.
avgGrpAB(ij) = The average of all the data values in the ijth subgroup for the unique value pairs from Column A and Column B.
avgGrpAC(ij) = The average of all the data values in the ijth subgroup for the unique value pairs from Column A and Column C.
avgGrpBC(ij) = The average of all the data values in the ijth subgroup for the unique value pairs from Column B and Column C.
avgGrpABC(ijk) = The average of all the data values in the ijkth subgroup for the unique value triplets from Column A, Column B and Column C.
avgTotal = The average of all the data values in the data column.

The equations for the calculated values are:

DFA = a-1
DFB = b-1
DFC = c-1
DFAB = (a-1)*(b-1)
DFAC = (a-1)*(c-1)
DFBC = (b-1)*(c-1)
DFABC = (a-1)*(b-1)*(c-1)
DFE = n - k
DFT = n - 1

Note: DFT = DFA + DFB + DFC + DFAB + DFAC + DFBC + DFABC + DFE

SSA = (m * b *c) * SUM(i = 1 to a) [ ( avgGrpA(i) - avgTotal) ** 2 ]
SSB = (m * a * c) * SUM(i = 1 to b) [ ( avgGrpB(i) - avgTotal) ** 2]
SSC = (m * a * b) * SUM(i = 1 to c) [ ( avgGrpC(i) - avgTotal) ** 2]
SSAB = (m * c ) * SUM(i = 1 to a) SUM( j = 1 to b) [ ( avgGrpAB(ij) - avgGrpA(i) - avgGrpB(j) + avgTotal) ** 2 ]
SSAC = (m * b ) * SUM(i = 1 to a) SUM( j = 1 to c) [ ( avgGrpAC(ij) - avgGrpA(i) - avgGrpC(j) + avgTotal) ** 2 ]
SSBC = (m * a ) * SUM(i = 1 to b) SUM( j = 1 to c) [ ( avgGrpBC(ij) - avgGrpB(i) - avgGrpC(j) + avgTotal) ** 2 ]
SSABC = (m ) * SUM(i = 1 to a) SUM( j = 1 to b) SUM( k = 1 to c)
[ ( avgGrpABC(ijk) - avgGrpAB(ij) - avgGrpAC(ik) - avgGrpBC(jk) + avgGrpA(i) + avgGrpB(j) + avgGrpC(k) - avgTotal) ** 2 ]
SSE = SST - ( SSA + SSB + SSC + SSAB + SSAC + SSBC + SSABC)
SST = SUM( i= 1 to n ) [ (x(i) - avgTotal) ** 2]
MSA = SSA / DFA
MSB = SSB / DFB
MSC = SSC / DFC
MSAB = SSAB / DFAB
MSAC = SSAC / DFAC
MSBC = SSBC / DFBC
MSABC = SSABC / DFABC
MSE = SSE / DFE
FValueA = MSA / MSE
FValueB = MSB / MSE
FValueC = MSC / MSE
FValueAB = MSAB / MSE
FValueAC = MSAC / MSE
FValueBC = MSBC / MSE
FValueABC = MSABC / MSE
PValueA = PValueFunction( DFA, DFE, FValueA )
PValueB = PValueFunction( DFB, DFE, FValueB )
PValueC = PValueFunction( DFC, DFE, FValueC )
PValueAB = PValueFunction( DFAB, DFE, FValueAB )
PValueAC = PValueFunction( DFAC, DFE, FValueAC )
PValueBC = PValueFunction( DFBC, DFE, FValueBC )
PValueABC = PValueFunction( DFABC, DFE, FValueABC )

Transient steps

Transient steps are temporary steps that process intermediate data within the warehouse. They use data that is output from earlier steps and produce data that is used by later steps. When all dependent steps have run successfully, the data for transient steps is automatically purged.

Transient steps appear in the Work in Progress window, but not in the Run New Step window. Task flows are used to hook up a transient step between two other steps.

Two-way ANOVA

A two-way ANOVA uses two grouping columns. The data assumptions for the two-way ANOVA are:

The data column is composed of data values (x1, x2, ..., xn).
The data column must contain equal sample sizes in each of the subgroups that are formed by the unique value pairs in the two grouping columns.
The sample size must be greater than 1. Unequal sample sizes are not allowed for the two-way ANOVA.

The following table is a sample of a two-way ANOVA summary table:

SOURCE

DF

SS

MS

FVALUE

PVALUE

Column A

DFA

SSA

MSA

FvalueA

PvalueA

Column B

DFB

SSB

MSB

FValueB

PvalueB

(Column A)x(Column B)

DFAB

SSAB

MSAB

FValueAB

PvalueAB

Within Groups

DFE

SSE

MSE

Null

null

Total

DFT

SST

null

Null

null

Assume the values for the following variables:

n = The total number of data values in the data column
a = The number of unique groups in Column A.
b = The number of unique groups in Column B.
k = The number of unique subgroups (cells) formed by the unique value pairs in the two grouping columns. Note: k = (a*b)
m = The number of data values in each subgroup (cell). m must be greater than 1.
x(i) = The ith data value in the data column.
avgGrpA(i) = The average of all the values in the ith group of Column A.
avgGrpB(i) = The average of all the values in the ith group of Column B.
avgGrpAB(ij) = The average of all the values in the ij-th subgroup (cell) for the unique value pairs from Column A and Column B.
avgTotal = The average of all the data values in the data column.

The equations for the values calculated are:

DFA = a-1
DFB = b-1
DFAB = (a-1)*(b-1)
DFE = n-k
DFT = n-1 (Also note: DFT = DFA + DFB + DFAB + DFE)
SSA = (m*b) * SUM(i=1 to a) [ (avgGrpA(i) - avgTotal)**2 ]
SSB = (m*a) * SUM(i=1 to b) [ (avgGrpB(i) - avgTotal)**2 ]
SSAB = (m) * SUM(i=1 to a)SUM(j= 1 to b) [ (avgGrpAB(ij) - avgGrpA(i) - avgGrpB(j) + avgTotal)**2 ]
SSE = SST - (SSA + SSB + SSAB)
SST = SUM(i=1 to n) [ (x(i) - avgTotal)**2 ]
MSA = SSA / DFA
MSB = SSB / DFB
MSAB = SSAB / DFAB
MSE = SSE / DFE
FValueA = MSA / MSE
FValueB = MSB / MSE
FValueAB = MSAB / MSE
PValueA = PValueFunction( DFA, DFE, FValueA)
PValueB = PValueFunction( DFB, DFE, FValueB)
PValueAB = PValueFunction( DFAB, DFE, FValueAB)

U, V

W

Warehouse primary keys and warehouse foreign keys

Warehouse primary keys and warehouse foreign keys are definitions that you create to describe the underlying database table keys.

A key for a table is one or more columns that uniquely identify the table. The warehouse primary key for a table is one of the possible keys for the table that you define as the key to use. A warehouse foreign key is a column in a table whose allowable values must exist in the warehouse primary key in another table. The table that contains the warehouse foreign key is called a dependent table; the table that contains the warehouse primary key is called a parent table.

For example, you have a table that defines the departments in your organization and another table that defines the employees in your organization. Each department is identified by a unique department number, and each employee is a member of a department.

You define the Department Number column as the warehouse primary key of the Department table. You define the Department Number column of the Employee table as a warehouse foreign key, which references the Department Number column (warehouse primary key) of the Department table. This structure ensures that any department numbers that are entered in the employee table are defined in the Department table.

For more information about this example, see IBM Universal Database SQL Getting Started.

You can use warehouse primary keys and warehouse foreign keys to support business rules that are helpful in the creation of a warehouse. For example, you can use warehouse primary keys and warehouse foreign keys to predefine a join between two tables. Using the example of the Department table and the Employee table, you could join the Department and Employee tables, using the Department Number column, to obtain a table that contains the description of the department for each employee as well as the employee number.

In the Data Warehouse Center, you use warehouse primary keys and warehouse foreign keys to predefine a star join between source tables in a step.

You can import warehouse primary keys and warehouse foreign keys or manually define new warehouse primary keys and warehouse foreign keys to use in the Data Warehouse Center only. The Data Warehouse Center does not alter the underlying database table if you define new keys. The Data Warehouse Center verifies that any keys that you define are consistent within the Data Warehouse Center, but it does not verify consistency with the underlying database tables.

You can also define warehouse primary keys and warehouse foreign keys in the target table for users or to join the target table with source tables if you use it as a source. When you define warehouse primary keys and warehouse foreign keys for a target table, the Data Warehouse Center alters the target table to include the warehouse primary keys, but not the warehouse foreign keys.

Warehouse schema

A warehouse schema is a specialized design that consists of multiple dimension tables, which describe aspects of a business, and one fact table, which contains the facts about the business. For example, if you have a mail-order business selling books, some dimension tables are customers, books, catalogs, and fiscal years. The fact table contains information about the books that are ordered from each catalog by each customer during the fiscal year. A join relationship supports the business rules that you use in creating your warehouse. In the Data Warehouse Center, a join relationship is a query on a warehouse schema, which includes the fact tables and some dimension tables. The join relationship is based on the primary and foreign keys defined for the source tables, as well as non-keyed columns. The join relationship joins the columns of the dimension tables with the columns in the fact table.

The primary and foreign keys that can be used in join relationships do not require actual database keys. You can use primary and foreign keys that you imported from DB2 Universal Database or you can define new primary and foreign keys in the Data Warehouse Center. These keys are called warehouse primary and warehouse foreign keys because they are defined in the warehouse metadata, not the physical database. You can use these keys to pre-define joins in the Data Warehouse Center. In the previous example, the customers table has a primary key of customer number, and each book has a primary key of its book number (ISBN). Each order in the fact table includes foreign keys of customer number and book number. The join relationship combines information about the customers and books with the orders.

To define a join relationship in the Data Warehouse Center, first add the tables to the warehouse schema using the table icon in the pallet. You can add tables from one database (or more, if the Warehouse Schema Properties notebook does not show that the tables can only come from one database). Autojoin will join the tables based on primary and foreign key relationships. The join icon on the pallet will allow other column joins between tables.

Warehouse source

Warehouse sources are sources that provide data to Data Warehouse Center steps. The sources can be nearly any relational or nonrelational source (table, view, or file) that has connectivity to your warehouse.

To define relational or nonrelational data to your warehouse, you must define a warehouse source using the Warehouse Source notebook. To define a warehouse source, you must:

Acquire proper password and authority to the source.
Establish connectivity to the agent site with ODBC drivers (unless your source database resides on an AS/400). See the Data Warehouse Center Administration Guide.

The Data Warehouse Center supports warehouse source tables that use ordinary SQL identifiers. An ordinary identifier:

Must start with a letter.
Can include uppercase letters, numbers, and underscore characters.
Cannot be a reserved word.

If a table has a lowercase letter as part of its ordinary identifier, the Data Warehouse Center stores the lowercase letter as an uppercase letter.

The Data Warehouse Center does not support warehouse source tables that use delimited identifiers. A delimited identifier:

Is enclosed within double quotation marks.
Can include uppercase and lowercase letters, numbers, underscore characters, and spaces.
Can contain a double quotation mark, which is represented by two consecutive quotation marks.

The Data Warehouse Center can use the following kinds of sources as warehouse sources (see the appropriate installation book for the versions of non-IBM databases that are supported):

Relational data

DB2 UDB for AS/400 (CISC) - IBM relational database for CISC-based AS/400 systems

DB2 UDB for AS/400 (RISC) - IBM relational database for RISC-based AS/400 systems

DB2 UDB for OS/390 - IBM relational database for the OS/390 platform

DB2 UDB for AIX - IBM relational database for AIX systems

DB2 UDB for HP-UX - IBM relational database for HP-UX systems

DB2 UDB for Linux - IBM relational database for the Linux platform

DB2 UDB for OS/2 - IBM relational database for the OS/2 platform

DB2 UDB for SCO UnixWare - IBM relational database for SCO UnixWare systems

DB2 UDB for PTX - IBM relational database for PTX systems

DB2 UDB for Sun Solaris - IBM relational database for the Sun Solaris platform

DB2 UDB for Windows NT- IBM relational database for Windows NT

DB2 UDB for VM - IBM relational database for VM systems

DB2 UDB for VSE - IBM relational database for VSE systems

DB2 UDB EEE - the Enterprise - Extended Edition of DB2 UDB

DataJoiner - IBM database middleware that accesses, joins, and updates data from different sources

DB2 Family - includes all non-UDB versions of DB2

Informix - Informix relational database systems

Generic ODBC - applies to ODBC resources not included with the Data Warehouse Center, such as Microsoft Excel and other Microsoft compliant ODBC drivers.

Oracle - Oracle relational database systems

Sybase - Sysbase relational database systems

Microsoft SQL Server - Microsoft SQL Server relational database systems

Nonrelational data

IBM IMS and VSAM databases - IBM IMS and VSAM database systems

Local files - plain text files that contain character-delimited, variable-length data on a local workstation

Remote files - plain text files that contain character-delimited, variable-length data on a remote workstation

Warehouse target

Warehouse targets are tables or files that contain the data that results from Data Warehouse Center steps. Like a warehouse source, warehouse targets can also provide data to Data Warehouse Center steps.

A warehouse target can be defined on any of the following databases or files:

DB2 UDB for AS/400 (CISC) - IBM relational database for CISC-based AS/400 systems
DB2 UDB for AS/400 (RISC) - IBM relational database for RISC-based AS/400 systems
DB2 UDB for OS/390 - IBM relational database for the OS/390 platform
DB2 UDB for AIX - IBM relational database for AIX systems
DB2 UDB for HP-UX - IBM relational database for HP-UX systems
DB2 UDB for Linux - IBM relational database for the Linux platform
DB2 UDB for OS/2 - IBM relational database for the OS/2 platform
DB2 UDB for SCO UnixWare - IBM's relational database for SCO UnixWare systems
DB2 UDB for PTX - IBM relational database for PTX systems
DB2 UDB for Sun Solaris - IBM relational database for the Sun Solaris platform
DB2 UDB for Windows NT - IBM's relational database for Windows NT
DB2 UDB EEE - the Enterprise - Extended Edition of DB2 UDB
DataJoiner - IBM database middleware that accesses, joins, and updates data from different sources
DB2 Family - includes all non-UDB versions of DB2
Local files - plain text files that contain character delimited, variable length data residing on a local workstation

Warehouse target databases and files are the portion of your enterprise data warehouse that contains cleansed, transformed data that is ready for business analysis. Data warehouse users must have access to the warehouse target database.

In the Data Warehouse Center, warehouse targets are typically organized by subject area. For example, if your subject area deals with your company's sales results, the warehouse targets within it could contain sales data broken down by region.

Warehouse target table

A named data object that is associated with a warehouse target. A warehouse target table contains the results of steps after they run. You can create warehouse target tables with either the Control Center or the Data Warehouse Center. You use the Define Warehouse Target Table notebook to define the table to the Data Warehouse Center and to associate it with a particular warehouse target.

Column number	Column name	Data type	Description
	SOURCE	VARCHAR	Source of the ANOVA statistic
	DF	Integer	Degrees of freedom
	SS	Double	Sum of squares
	MS	Double	Mean sum of squares
	FVALUE	Double	F-value or F-ratio
	PVALUE	Double	Calculated P-value for the F-value and the degrees of freedom

Statistic	Input data type	Output data type
Count¹	Any data type	Large integer (COUNT)
Sum	A number²	Same as the data type of the input except: Large integer if input is small integer; double-precision floating point if the input is single-precision floating point. The sum must be within range of the data type of the result.
Average	A number²	Same as SUM, except: The average must be within range of the data type of the result.
Variance	A number²	Double-precision floating point.
Standard deviation	A number²	Double-precision floating point.
Standard error	A number²	Double-precision floating point.
Minimum	A number³	Same as the data type of the input.
Maximum	A number³	Same as the data type of the input.
Range	A number²	Same as the data type of the input.
Coefficient of variation	A number²	Double-precision floating point.

Value	Description
8	Parameter error. Too many or too few parameters were supplied to the program.
12	FTP error. The program detected an FTP error. For information about the FTP error, see the ftpppp.log file.
16	Internal error. The program detected an internal error.
32	Open Trace error The trace file that was generated by the program cannot be opened.
128	Missing DLL. This is a Windows NT error that can occur when the program was compiled on another operating system.

Value	Description
4	Warning. The program ran but might have encountered a problem.
8	Parameter error. Too many or too few parameters were supplied to the program.
12	Connect error. The program detected an error in connecting to the database.
16	Export error. The program detected an user error during export, such as incorrect syntax in the select statement.
20	Internal error. The program detected an internal error.
28	VWS_LOGGING error. The VWS_LOGGING environment variable is not set.
32	Trace file error. The trace file could not be opened.
128	Missing DLL. This is a Windows NT error that can occur when the program was compiled on another operating system.

Privilege	Action
Administration	Define and change warehouse users and warehouse groups, change Data Warehouse Center properties, import metadata, and define which groups have access to objects when they are created.
Operations	Open and use all the functions in the Work in Progress window.

Data Warehouse Center object	Who can define	Who can edit
Subject area	Any user	Any user
Process	Any user	Only users who belong to a group that has access to the process
Step in a process	Only users who belong to a group that has access to the process	Only users who belong to a group that has access to the process
Table, view, or file in a process	Only users who belong to a group that has access to the process	Only users who belong to a group that has access to the process
Warehouse source	Any user	Only users who belong to a group that has access to the warehouse source
Warehouse target	Any user	Only users who belong to a group that has access to the warehouse target
Tables, views, and files within a warehouse source	Only users who belong to a group that has access to the warehouse source	Only users who belong to a group that has access to the warehouse source
Tables and files within a warehouse target	Only users who belong to a group that has access to the warehouse target	Only users who belong to a group that has access to the warehouse target
Schemas	Any user	Any user
Tables in a schema	-	Only users who belong to a group that has access to the warehouse source or target that contains the tables
Warehouse agents	Any user	Any user
Programs group	Any user	Any user
Program definition within a program group	Any user	Any user
Warehouse groups	Only users who belong to a group that has administration privilege	Only users who belong to a group that has administration privilege
Warehouse users	Only users who belong to a group that has administration privilege	Only users who belong to a group that has administration privilege
Data Warehouse Center properties notebook	-	Only users who belong to a group that has administration privilege

Data type	Size	Scale	Is null	Is text
CHAR	user defined (10)	none	true	true
VARCHAR	user defined (10)	none	true	true
Decimal	user defined (5,2)	yes	true	false
Numeric	user defined (5,2)	yes	true	false
Small int	fixed (2)	none	true	false
Integer	fixed (4)	none	true	false
Real	fixed	none	true	false
Date	fixed (6)	none	true	false
Time	fixed (6)	none	true	false
Timestamp	fixed (16)	none	true	false
Double	fixed (8)	none	true	false
Float	fixed (8)	none	true	false
Long VARCHAR DB2 UDB for AS/400	fixed (32740)	none	true	false
Long VARCHAR DB2 Common Server	fixed (32700)	none	true	true
CLOB	user defined (0)	none	true	true

Informix 7 and 9 data type	ODBC data type
CHAR	CHAR
VARCHAR	VARCHAR
FLOAT	FLOAT
FLOAT	DOUBLE
SMALLFLOAT	REAL
DATE	DATE
INTEGER	INTEGER
SMALLINT	SMALLINT
SERIAL	INTEGER
DECIMAL	DECIMAL
MONEY	DECIMAL
TEXT	LONG VARCHAR
TIMESTAMP	TIMESTAMP
DATETIME YEAR TO FRACTION	N/A

Campbell (DOUBLE)	Cupertino (DOUBLE)	Los Gatos (DOUBLE)	Mill Valley (DOUBLE)
296,539	345,908	357,498	475,676
263,463	318,495	335,843	471,476
309,657	339,572	350,467	428,451

Pivot column (CHAR)	(DOUBLE)	(DOUBLE)	(DOUBLE)
Campbell	296,539	263,463	309,657
Cupertino	345,908	318,495	339,572
Los Gatos	357,498	335,843	350,467
Mill Valley	475,676	471,476	428,451

Microsoft SQL Server 7 data type	ODBC data type
NCHAR	CHAR for bit data
NVARCHAR	VARCHAR for bit data
NTEXT	LONG VARCHAR for bit data

SOURCE	DF	SS	MS	FVALUE	PVALUE
Between groups	DFB	SSB	MSB	Fvalue	Pvalue
Within groups	DFE	SSE	MSE	Null	Null
Total	DFT	SST	Null	Null	Null

Oracle 7 and 8 data type	ODBC data type
CHAR	CHAR
VARCHAR2	VARCHAR
NUMBER	DECIMAL
DATE	TIMESTAMP
LONG	LONG VARCHAR
LONG RAW	LONG VARCHAR
RAW	VARCHAR
NUMBER	DOUBLE

Data Group	Pivot Group 1	Pivot Group 2
Orange	C	D
Purple	E	F
Red	G	H

A (this is a carry-over column)	B (this is a carry-over column)	Data Group	Pivot Group 1	Pivot Group 2
A1	B1	Orange	C1	D1
A1	B1	Purple	E1	F1
A1	B1	Red	G1	H1

Clean type	Minimum number of columns in rules table	Purpose of columns
Find and replace	2 - 3	The first column contains find values. The second column contains replace values. The third column, if specified, contains rule-ordering information and is not required.
Discretize	3 - 4	The first column contains the low end of the find value range. The second column contains the high end of the find value range. The third column contains the replacement value. The forth column, if specified, contains rule-ordering information and is not required.
Clip	3 - 5	The first column contains the low end of the find value range. The second column contains the high end of the find value range. The third column contains the low end of the replacement value. The fourth column contains the high end of the replacement value. The fifth column, if specified, contains rule-ordering information and is not required. Tip: When you use the Clip clean type, you can run a find and replace procedure that searches outside of value ranges.
Carry over	None allowed	Rule table not used.

Value	Description
8	Parameter error. Too many or too few parameters were supplied to the program.
12	FTP error. The Data Warehouse Center program detected an FTP error, such as the inability to create the JES log file. For more information about the FTP error, see the ftpppp.log file.
16	Internal error. The program detected an internal error, such as the inability to erase a previously existing JES log file.
32	Open Trace error The program detected a trace error, such as the inability to create or open the trace file.
128	Missing DLL. A Windows NT error that can occur when the program was compiled on another operating system.

Sysbase 10/11 data type	ODBC data type
CHAR	CHAR
INT	INTEGER
TINYINT	TINYINT
SMALLINT	SMALLINT
FLOAT	FLOAT
DOUBLE	DOUBLE
DECIMAL	DECIMAL
DATETIME	TIMESTAMP
SMALLDATETIME	TIMESTAMP
MONEY	DECIMAL
SMALLMONEY	DECIMAL
VARCHAR	VARCHAR
SYSNAME	VARCHAR
NUMERIC	NUMERIC
TEXT	LONG VARCHAR
BINARY	CHAR (With Is Text field not checked)
BIT	BIT
IMAGE	LONGVARCHAR
REAL	REAL
TIMESTAMP	VARCHAR
VARBINARY	VARCHAR (With Is Text field not checked)