Administration Guide

What is Data Warehousing?

The systems that contain operational data--the data that runs the daily transactions of your business--contain information that is useful to business analysts. For example, analysts can use information about which products were sold in which regions at which time of year to look for anomalies or to project future sales.

However, there are several problems with analysts accessing the operational data directly:

Data warehousing solves these problems. In data warehousing, you create stores of informational data--data that is extracted from the operational data, and then transformed for end-user decision making. For example, a data warehousing tool might copy all the sales data from the operational database, perform calculations to summarize the data, and write the summarized data to a target in a separate database from the operational data. End users can query the separate database (the warehouse) without impacting the operational databases.

The following sections describe the objects (subject areas, warehouse sources, warehouse targets, agents, agent sites, steps, and processes) that you will use to create and maintain your data warehouse.

Subject Areas

A subject area identifies and groups the processes that relate to a logical area of the business. For example, if you are building a warehouse of marketing and sales data, you can define a Sales subject area and a Marketing subject area. You can then add the processes that relate to sales underneath the Sales subject area. Similarly, you can add the definitions that relate to the marketing data underneath the Marketing subject area.

Warehouse Sources

Warehouse sources identify the tables and files that will provide data to your warehouse. The Data Warehouse Center uses the specifications in the warehouse sources to access and select the data. The sources can be nearly any relational or nonrelational source (table, view, or file) that has connectivity to your warehouse.

Warehouse Targets

Warehouse targets are database tables or files that contain data that has been transformed so that end users can use it. Like a warehouse source, warehouse targets can also provide data to Data Warehouse Center steps.

Warehouse Agents and Agent Sites

Data Warehouse Center agents manage the flow of data between the data sources and the target warehouses. Agents are available on the Windows NT, AIX, OS/2, OS/390, OS/400, and SUN Solaris operating systems. The agents use Open Database Connectivity (ODBC) drivers or DB2 CLI to communicate with different databases.

Several agents can handle the transfer of data between sources and target warehouses. The number of agents that you use depends on your existing connectivity configuration and the volume of data that you plan to move through your warehouse. Additional instances of an agent can be generated if multiple processes that require the same agent are running simultaneously.

Agents can be local or remote. A local warehouse agent is an agent that is installed on the same machine as the warehouse server. A remote warehouse agent is an agent that is installed on another machine that has connectivity to the warehouse server.

An agent site is a logical name for a workstation where agent software is installed. The agent site name is not the same as the TCP/IP host name. A single physical machine can have only one TCP/IP host name. However, you can define multiple agent sites on a single machine. A logical name identifies each agent site.

The default agent site, named the Default VW AgentSite, is a local agent on Windows NT that Data Warehouse Center defines during initialization of the warehouse control database.

Steps and Processes

A step is a logical entity in the Data Warehouse Center that defines:

Steps move data and transform data by using SQL statements or by calling programs. When you run a step, the transfer of data between the warehouse source and the warehouse target, and any transformation of that data, takes place.

A process contains a series of steps that perform transformation and movement tasks. In general, a process populates a warehouse target in a warehouse database by extracting data from one or more warehouse sources, which can be database tables or files. However, you can also define a process for launching programs that does not specify any warehouse sources or targets.

You can run a step on demand, or you can schedule a step to run at a set time. You can schedule a step to run one time only, or you can schedule it to run repeatedly, such as every Friday. You can also schedule steps to run in sequence, so that when one step finishes running, the next step begins running. You can schedule steps to run upon (successful or unsuccessful) completion of another step. If you schedule a process, the first step in the process runs at the scheduled time.

When a step or a process runs, it can save data by:

Suppose that you want Data Warehouse Center to perform the following tasks:

  1. Extract data from different databases.
  2. Convert the data to a single format.
  3. Write the data to a table in a data warehouse.

You would create a process that contained individual steps. Each step would perform a separate task, such as extracting the data from the databases, or converting it to the correct format. You would then use another step to populate the target table, which contains the transformed data.

The following sections describe the various types of steps that you will find in the Data Warehouse Center. For more information about steps, refer to the Data Warehouse Center Administration Guide.

SQL Steps

An SQL step uses an SQL SELECT statement to extract data from a warehouse source, and generates an INSERT statement to insert the data into the warehouse target table.

Program Steps

There are several types of program steps: DB2 for AS/400 Programs, DB2 for OS/390 Programs, DB2 for UDB Programs, Visual Warehouse 5.2 DB2 Programs, OLAP Server Programs, File Programs, and Replication. These steps run predefined programs and utilities.

Transformer Steps

Transformer steps are stored procedures and user-defined functions that specify statistical or warehouse transformers that you can use to transform data. You can use transformers to clean, invert, and pivot data; generate primary keys and period tables; and calculate various statistics.

In a transformer step, you specify one of the statistical or warehouse transformers. When you run the process, the transformer step writes data to one or more warehouse targets.

User-defined Program Steps

A user-defined program step is a logical entity within the Data Warehouse Center that represents an application that you want the Data Warehouse Center to start. A warehouse agent can start a user-defined program step:

For example, you can write a user-defined program that will perform the following process:

  1. Export data from a table.
  2. Manipulate that data.
  3. Write the data to an interim output resource or a warehouse target.


[ Top of Page | Previous Page | Next Page ]