DB2 Server for VSE & VM: Database Administration

Chapter 2. Implementing Your Design

After determining the design of your database, you can create objects to implement your design. These objects include dbspaces, tables, views, and indexes.

This chapter discusses the following topics:

Database Storage Concepts
This section provides an overview of the physical database and explains the relationships between objects, dbspaces, and storage pools.
Database Generation

When you create a database, its potential storage capacity is defined. You must do some planning to ensure that the database satisfies your data storage requirements.
Defining Dbspaces

The task of defining dbspaces, which contain tables, views, and indexes, involves reserving logical space in the database, assigning the dbspace to a storage pool, and setting usage parameters. You must understand what these parameters are and how to select them so that the dbspace will best accommodate the data to be stored in it.
Creating Tables

Information is stored in a database by placing it in tables. You must know how to create tables and how to define referential constraints.
Creating Views

After you create tables, you can create views. A view is a logical, or virtual, table that is derived from one or more tables or other views. Using views can be advantageous in applications that have specific requirements for data tables.
Creating Indexes

Indexes are optional: they improve the speed with which table rows are accessed.
Using the Catalog in Database Design

The catalog tables contain information about the existing structure of the database, which can be helpful in database design.

Storage Concepts

A DB2 Server for VSE & VM database is a collection of user data objects (tables and indexes) and supporting information maintained by the database manager for that data. The supporting information includes control information (such as how each data table is formatted and where each is located), and data recovery information (restoring data to an earlier state). The database is composed of:

A Directory: In VM this is a minidisk that contains database control information. In VSE it is a VSAM data set. It includes mappings of the dbspaces to their addresses on the DASD (that is, it relates the logical database image to the physical storage used).
One or two Logs: In VM, these are minidisks and in VSE, these are VSAM data sets. These contain information about the changes made to the data. If any changes must be "undone" or "redone", logs can be used to restore the data to its proper state.
One or more Storage Pools: In VSE these are collections of VSAM data sets, and in VM these are collections of minidisks. Each is called a database extent or dbextent.

A dbextent is an allocation of actual DASD space. Storage pools are composed of one or more dbextents. The size of the storage pool can be increased or reduced by:

adding more dbextents
deleting existing dbextents
In VM/ESA, moving dbextents to other devices.

Note:

In VSE, each dbextent is the primary allocation of a VSAM data set (CLUSTER).

Storage pools can be defined to be either recoverable or nonrecoverable. The default is for them to be recoverable, whereby every change made to the pool is logged. For nonrecoverable storage pools, there is limited recovery; the database manager does not log updates, but takes a checkpoint for each logical unit of work (LUW) to ensure that the LUW's changes are written to DASD.

To maintain referential integrity, both tables in any referential constraint must be in either recoverable or in nonrecoverable storage pools: they cannot be spread across both types.

This restriction is necessary because the portion of the relationship in the nonrecoverable pool might be lost, possibly invalidating the information remaining in the recoverable one. For more information about storage pools, refer to either the DB2 Server for VM System Administration or DB2 Server for VSE System Administration manual.

When a table is created, it must be assigned to a logical allocation of storage called a dbspace. The table creator can either do this assignment explicitly, or let the database manager use a default assignment. Any indexes created on that table will be stored in the same dbspace.

Figure 15 shows how tables are stored in the database. It includes two tables and their indexes in dbspace A, two tables and their indexes in dbspace B, and one table with three indexes in dbspace C.

Figure 15. Table Storage in a Database

View figure.

How Information is Stored in Dbspaces

A dbspace is not a real allocation of DASD space: it is a logical allocation of page map tables in the directory that relates logical dbspace pages to DASD locations. It holds data in 4096-byte blocks called pages, and can hold up to 255 tables, and their indexes. As dbspaces are assigned to the storage pool and their pages are filled, the physical DASD pages used are taken from the dbextents of the storage pool.

The database manager dynamically allocates real DASD storage space to support dbspace pages on a demand basis. Unused pages of a dbspace do not occupy DASD space. The potential capacity of a dbspace is fixed when it is defined.

The dbspace used to hold a table is determined when the table is created. A table cannot span (reside in) multiple dbspaces. However, two or more tables in a referential relationship may reside in separate dbspaces.

Figure 16 shows how information is stored in a dbspace.

Figure 16. Table and Index Storage in a Dbspace

View figure.

At the front of every dbspace are one to eight header pages, which contain control information about the tables and indexes stored in it. After the header pages are the data pages, which is where the rows of a table are stored. Index entries are stored in index pages at the back.

When you store multiple tables in the same dbspace, the database manager might store rows from different tables on the same data pages; however, it never puts index entries from different indexes on the same page.

Database Generation

This book does not describe how to create a database. That is the task of the system administrator, and is discussed in the DB2 Server for VM System Administration and DB2 Server for VSE System Administration manuals. Because initial DASD allocations are assigned and the potential capacity for the database is established during that process, it is important that you analyze your storage requirements and inform the person responsible for generating the database. The information you provide should include the:

Number of tables and views (objects) you intend to create
Structure of those objects (such as number of columns, data type)
Storage required for your objects.

Defining Dbspaces

Before defining a new dbspace, check to see if there are any already available having the properties that you require; if there are, you do not need to define a new one.

If you need to define one or more dbspaces, do the following:

Identify your requirements.

Identify the data that the dbspace will contain and the way that it will be used.
Add the dbspace to the database.

Add the dbspace to the database directory (if this has not already been done), using either the SQLADBSP EXEC in VM, or the ADD DBSPACE statement in VSE.
Acquire the dbspace.
After a dbspace is established, enter the ACQUIRE DBSPACE statement to acquire it for your use.

Identifying Dbspace Requirements

To identify dbspace requirements, consider the tables that are to be stored and the way they will be used. If performance is a requirement, you can define a dbspace to support only one table and its indexes; often, however, dbspaces are defined to support several tables. Tables that have common requirements can be stored in the same dbspace.

Mapping Tables to Dbspaces

Table 4 shows the approach you should use for determining the way to map tables to dbspaces.

Table 4. General Approach to Mapping Tables to Dbspaces

Table Access Type of Dbspace Type of Data
Private tables PRIVATE dbspaces (one per user, or user-application area)
End user data
Application development data
Data prototyping tables

Shared tables PUBLIC dbspaces (one per user group, or table group)
Common end user data
Application testing data
Production application data

Table Access	Type of Dbspace	Type of Data
Private tables	PRIVATE dbspaces (one per user, or user-application area)	End user data Application development data Data prototyping tables
Shared tables	PUBLIC dbspaces (one per user group, or table group)	Common end user data Application testing data Production application data

Dbspaces come in two types: PRIVATE and PUBLIC.

For private data, reserve one PRIVATE dbspace for each user. Private data is always locked at the dbspace level to eliminate unnecessary locking overhead when users are accessing their own private data.

Data kept in a PRIVATE dbspace can be shared, and concurrent read-only access to the data is possible.

For most users, one PRIVATE dbspace is sufficient; however, people doing application or data design for different application areas might want one for each area. Others might request additional storage as their data requirements grow. For these users, you can reserve additional PRIVATE dbspaces as needed.

For data that is to be shared, use PUBLIC dbspaces. These can be locked either at the row, page, or dbspace level. Thus, several users can access data at the same time. (See Determining the Lock Size (LOCK).)

PUBLIC dbspaces support tables shared by a group of users. For example, a group of query users may have to share data. Rather than having each user keep a copy of the data, the extracted data could be directed to tables in a PUBLIC dbspace, where it could be accessed by all users.

For production application data, you should define one or more PUBLIC dbspaces, depending on logical groupings of tables. For further information on placing tables into dbspaces, refer to the DB2 Server for VSE & VM Performance Tuning Handbook manual.

Adding Dbspaces to the Database

To add a dbspace to a database you must reserve page tables in the directory, assign the dbspace to a storage pool, and specify the dbspace's type. These functions are described in the DB2 Server for VSE System Administration and DB2 Server for VM System Administration manuals.

Do not use SYS as the first three characters of a dbspace name; SYS denotes a dbspace reserved for database manager use.
Note: When you add dbspaces, you must be in single user mode.

Acquiring Dbspaces

After you have identified the mapping of tables to dbspaces, and the dbspaces have been added to the database, you can acquire them for use. Begin this process by identifying the parameters to be established for each dbspace. Table 5 summarizes these parameters; they are discussed in detail below.

Table 5. Derivation of Dbspace Parameters

Parameter Derivation
Type PUBLIC or PRIVATE, based on expected usage of tables.
SIZE (PAGES) Sum of the potential sizes of each of the tables, plus the sum of the index size requirements, plus free space considerations.
STORPOOL Consider device utilization of other dbspaces in the same pool and the availability of space in the pool. Also consider using nonrecoverable storage pools for read-only data.
NHEADER Set based on the number of tables and indexes to be put in the dbspace.
PCTFREE Set based on growth potential of the tables to be put in the dbspace.
PCTINDEX Set based on the potential indexes to be created and their estimated sizes.
LOCK Set based on the size of tables and the extent of their use.

Parameter	Derivation
Type	PUBLIC or PRIVATE, based on expected usage of tables.
SIZE (PAGES)	Sum of the potential sizes of each of the tables, plus the sum of the index size requirements, plus free space considerations.
STORPOOL	Consider device utilization of other dbspaces in the same pool and the availability of space in the pool. Also consider using nonrecoverable storage pools for read-only data.
NHEADER	Set based on the number of tables and indexes to be put in the dbspace.
PCTFREE	Set based on growth potential of the tables to be put in the dbspace.
PCTINDEX	Set based on the potential indexes to be created and their estimated sizes.
LOCK	Set based on the size of tables and the extent of their use.

Use the ACQUIRE DBSPACE statement to specify the parameters in Table 5. When acquiring a dbspace, you must specify whether it is to be PUBLIC or PRIVATE, and you can optionally set the number of pages in it, the level of recovery, the percentage of space to be reserved for updates and indexes, and the amount to be locked when accessed by users. See the DB2 Server for VSE & VM SQL Reference manual for more information on the ACQUIRE DBSPACE statement.

Determining Dbspace Type (PUBLIC or PRIVATE)

If any table is to be accessed by multiple users at the same time, and any one of the users will be doing UPDATEs, INSERTs, or DELETEs, then it should be placed in a PUBLIC dbspace. You need Database Administrator (DBA) authority to acquire a PUBLIC dbspace.

Only users with DBA or RESOURCE authority can create objects in PUBLIC dbspaces.

To acquire a PUBLIC dbspace, enter the ACQUIRE DBSPACE statement specifying your requirements. For example, to acquire a PUBLIC dbspace named payroll and using the defaults, enter:

   ACQUIRE PUBLIC DBSPACE NAMED PAYROLL

You need DBA or RESOURCE authority to acquire a PRIVATE dbspace.

Only the owner of the PRIVATE dbspace, or a user with DBA authority, can create objects in the dbspace.

Every PRIVATE dbspace has an owner. To acquire the PRIVATE dbspace PERSONAL for user JOHN, enter the following:

   ACQUIRE PRIVATE DBSPACE NAMED JOHN.PERSONAL

You cannot use the ALTER DBSPACE statement to change the type of a dbspace after you acquire it.

Determining the Size of the Dbspace (PAGES)

You need to ensure that the dbspace contains enough pages to hold the tables and associated indexes to be stored there.

The size of the dbspace should be based on the estimated current size of the tables and their indexes, plus an allowance for their expected growth. A dbspace cannot contain less than 128 pages. You must allocate pages in multiples of 128, otherwise the number is rounded up to the next highest multiple of 128. Algorithms for determining the number of pages needed are described in Appendix A, Estimating Your Dbspace Requirements.

Because you cannot extend a dbspace after it is defined, you should overestimate the required number of pages. Unused pages are not stored, so the cost of overestimating is nominal. In contrast, the cost of underestimating pages can be quite expensive because of the reorganization activities required to re-establish the data in a larger dbspace later.
Note: Two directory blocks of 512 bytes each are used for every 128 data pages defined.

Determining the Storage Pool (STORPOOL)

Storage pools come in two types: recoverable and nonrecoverable.

Consider assigning a dbspace to a nonrecoverable storage pool if the data in it will be read-only. Changes made to data in a nonrecoverable storage pool are not logged, which offers the advantages of requiring less log space, elapsed time, and CPU time. (There should be an alternative method of recovery available, such as reloading the storage pool.) The disadvantage is that data cannot be recovered when media failures occur (which may be acceptable for read-only data).

If you are using referential integrity, you must use recoverable storage pools. For nonrecoverable storage pools, ROLLBACK is not performed and no logging is in effect, so that some operations can be neither completed successfully nor rolled back. Each operation containing a referential constraint is verified when it occurs. If a row of a multi-row operation violates the referential constraint, the operation terminates. The rows that were affected prior to the termination cannot be rolled back.

For example, in a multi-row delete of a parent table, if 15 rows are candidates for deletion and the ninth row violates the DELETE RESTRICT rule, then the first eight rows would be deleted and the operation would cease with the ninth row. The integrity of the table would be maintained but the operation would be only partially completed.

Because a unit of work modifying both recoverable and nonrecoverable pools can only ROLLBACK the recoverable pool, referential constraints cannot be created between the two types of pools.

You cannot use the ALTER DBSPACE statement to change the storage pool of a dbspace after you acquire it.

If you do not specify the STORPOOL parameter, a dbspace of the correct size and type will be acquired from any recoverable storage pool.

Storage Device Considerations

The storage pool you select should be chosen to:

Balance device utilization
Exploit device characteristics for data in the dbspace.

A table resides on the devices used to support the storage pool to which the table's dbspace is assigned. Consider storing different tables on different devices based on device characteristics and table usage. To do this, you need multiple storage pools and multiple dbspaces.

For example, if you have two tables that are highly active, you can reduce potential device contention by storing them in different dbspaces that are assigned to different storage pools. The dbextents defined for the two storage pools would be on different devices.

You could use a similar technique for storing selected tables on higher or lower speed devices as appropriate.

For more information about storage pools, refer to either the DB2 Server for VM System Administration or the DB2 Server for VSE System Administration manual.

Determining the Number of Header Pages (NHEADER)

Header pages contain control information on the tables and indexes stored in the dbspace.

The number of header pages required depends on the number of objects to be stored in the dbspace. Generally, taking the default (8 pages) is recommended, as this gives you the most flexibility at nominal cost. However, if you plan to have few tables or indexes in the dbspace, you may allocate fewer. You must allocate at least one.

To estimate your needs, see Appendix A, Estimating Your Dbspace Requirements.

You cannot change the NHEADER parameter with the ALTER DBSPACE statement; after you set it, the only way to change it is to move all the data in the current dbspace to another dbspace having the required NHEADER value (see Altering the Design of a Dbspace).

Determining the Percent Free Space Desired (PCTFREE)

The PCTFREE parameter refers to the percentage of each page that is to be reserved for updates that make the changed row longer than it was before. This free space is not used for inserts. You can reclaim the free space by reducing the PCTFREE value through an ALTER DBSPACE statement.

The PCTFREE value you choose will depend on the type of activity being carried out on the data in the dbspace:

High Insert/Low Update Activity
This is the situation where there will be few updates, or all columns are fixed and non-nullable in the tables. Here, you would set PCTFREE to a high value before loading the data; then lower it to a low value. The difference between the original value and the final value can then be used by insert activity.
Low Insert/High Update Activity
In this situation, PCTFREE should be set to a high value. The space saved by PCTFREE will be used by the update activity only if the update increases the size of the row and the free space will accommodate the new row.
Low Insert/Low Update Activity Or Read-Only Data
Read-only data is data that is loaded into a dbspace and then never modified or updated, only retrieved using query statements. In this situation, set PCTFREE to a low value or zero.
High Insert/High Update Activity
In this situation, set PCTFREE to a high value and then lower it. This would allow space for use by both update and insert activities.

Note: Updating refers to the replacement of a row of data into the same location in a page of a dbspace, unless the row can no longer fit because of an increase in the size of one or more columns. The replacement row is placed on the same page of the dbspace if there is still sufficient space available in the area set aside using PCTFREE.

In situations where there is high insert activity, consider using a clustering index. The first index created on a table is always considered the clustering index. A clustering index determines the placement of rows in pages of a dbspace to minimize DASD I/Os when the table rows are accessed in the index sequence. For more information, see Clustering Rows of a Table on an Index.
Note: Clustering refers to the grouping or gathering of items; in the above case, the grouping of table rows is done according to the indexes.

If an updated row no longer fits on its original page, its contents are moved to the next available page with enough room to accommodate it. Continual movement of row contents to new pages as a result of this overflow may lead to a decrease in performance as the database manager must make one additional page reference before locating a row's contents.

The database manager typically reserves more space than you specify. For an explanation of free space management design, see the DB2 Server for VSE & VM Diagnosis Guide and Reference manual. Calculate PCTFREE using the following formula:

   PCTFREE = (FREEBYTES - AVGROWLEN) / 40

where FREEBYTES is the number of bytes you want reserved on each page, and AVGROWLEN is the average row length for tables in the dbspace. If you have modeled the tables to be stored in the dbspace, you can obtain a value for AVGROWLEN for each of the tables from SYSTEM.SYSCATALOG.

For normal processing, set PCTFREE somewhere between:

   [AVGROWLEN / 40] and [50 - (AVGROWLEN / 40)].

Setting it below the lower limit would mean the unused bytes could not be used (the average row would not fit) and the space set aside for updates would be wasted, while setting it greater than the upper limit may restrict you unnecessarily to one row per page.

For more information on how the PCTFREE parameter determines actual reserved bytes, see Appendix A, Estimating Your Dbspace Requirements.

Determining the Percentage for Index Pages (PCTINDEX)

When you acquire a dbspace, you must reserve some portion of it for holding indexes on the tables in the dbspace. PCTINDEX reserves the amount of space in the directory to be formatted for this purpose. Under most circumstances, you should let this value default to 33 percent. With this default, there are approximately twice as many data pages for holding table rows as there are index pages for holding indexes on the tables. You can create or drop indexes at any time (these functions can be performed online); so do not constrain the potential indexing you might want to do by specifying a lower value for PCTINDEX. There are two cases when you might want to consider overriding the default:

Read-only data
Some data is used exclusively, or primarily, for read-only (SELECT) access. You can create a more than one index on such data to improve the performance of a wide variety of user queries. The indexes are created after the data is loaded and are referenced as required by a query. Because the data is not subject to update operations, you do not have to worry about the performance implications of index maintenance. Thus, you should consider specifying a high value for PCTINDEX. To do this, estimate the number of index pages that would be required for various indexes that might be created on these tables in the dbspace. See Estimating the Number of Index Pages.
Highly tuned operational data
This is data that is subject to frequent updates, and the performance requirements limit the amount of indexing you want to do on the tables. Determine the set of indexes you require for the data and set the PCTINDEX parameter accordingly.

You establish the PCTINDEX parameter with the ACQUIRE DBSPACE statement. You cannot change the PCTINDEX parameter with the ALTER DBSPACE statement; after you set it, the only way to change it is to move all the data in the current dbspace to another dbspace having the required PCTINDEX value (see Altering the Design of a Dbspace).

Determining the Lock Size (LOCK)

When you acquire a PUBLIC dbspace you can specify three levels of locking: DBSPACE, PAGE, or ROW. You can change the lock size later with the ALTER DBSPACE statement.

The lock size can be set for PUBLIC dbspaces only. (PRIVATE dbspaces are always locked at the DBSPACE level.)

The default lock size is PAGE. Select ROW if the dbspace is to contain a small table that will fit on a small number of pages, and it is expected that this table will be frequently updated by multiple users.

Locking the dbspace at the row level also causes indexes in it to be locked at the key level. (Usually indexes are locked at the page level.) Key-level locking for indexes, like row-level locking for tables, reduces contention but adds overhead.

Retrieving Information about Dbspace Parameters

Information about the dbspace parameters is maintained in the SYSTEM.SYSDBSPACES catalog table.

Example

Use the following query to retrieve information about dbspace MYDB:

   SELECT DBSPACENO, DBSPACETYPE, POOL, NPAGES,
          NRHEADER, PCTINDX, FREEPCT, LOCKMODE
     FROM SYSTEM.SYSDBSPACES
     WHERE DBSPACENAME = 'MYDB'

To see how many header, data, and index pages are being used in a given dbspace, issue the SHOW DBSPACE operator statement from either the database console or from ISQL. (Its format is described in the DB2 Server for VSE & VM Operation manual.) This information may be helpful, especially before attempting to load large amounts of data into a dbspace.

Restrictions on the ACQUIRE DBSPACE Statement

To acquire a dbspace, it must have already been added to the database. When you issue the ACQUIRE DBSPACE statement, the database manager searches for a dbspace with the appropriate size (number of PAGES), storage pool assignment, and type (PUBLIC or PRIVATE). If one of the requested size cannot be found, the next largest suitable one will be used. (This could result in a very large dbspace being used to contain a small amount of data.) If no existing dbspace satisfies the requirements, then the ACQUIRE DBSPACE statement will fail, and you will have to add additional dbspaces to the database.

The SYSDBSPACES system catalog table contains information about dbspaces. You can issue an ISQL query to retrieve this information.

The following query yields information on the type and size of all available dbspaces (those that have been added but not yet acquired):

   SELECT DBSPACETYPE, NPAGES
     FROM SYSTEM.SYSDBSPACES
     WHERE DBSPACENAME=''

The value of DBSPACETYPE is 1 for PUBLIC dbspaces and 2 for PRIVATE ones.

Creating Tables

Relational databases use tables to store information. This section explains how to create tables and how to define referential and unique constraints in the DB2 Server for VSE & VM environment.

Controlling Who Creates Tables

Designing tables to be used by many applications is a critical task. Although you can add columns and use views to mask certain changes, generally you cannot change the design of a table after it has been implemented without disrupting applications. Table design is difficult because there are many ways to represent the same information, and often you have to decide between the conflicting objectives of logical design and physical design. (One example of such a conflict is normalization, discussed in Step 7: Normalize Your Tables.)

If you have DBA authority, you will probably want to keep the responsibility for creating tables, and then pass the authorization for their use on to the application developers. However, you can grant authority for creating tables to others; or, if some users want to use the application server with minimum assistance or control, you can acquire PRIVATE dbspaces for them and authorize them to create whatever data objects they need, including tables.

How to Create Tables

After designing a table, issue the CREATE TABLE statement. Creating a table involves:

Naming it
Naming the columns within it
Defining the appropriate data type for each column
Defining primary keys
Defining the relationships between tables
Defining unique constraints.

To create a table, the connected user must have the proper authority (see Chapter 5, Providing Security). Whoever issues the CREATE TABLE statement has complete authority over the table.

When you create a table, a definition of it is recorded in the catalog; no application data is stored. (For a description of how to put data into the table, see Loading Data into Tables.)

Figure 17 shows the statement used to create the sample EMPLOYEE table.

Figure 17. Example of CREATE TABLE. A foreign key cannot be defined unless the corresponding primary key already exists.

CREATE TABLE JOHN.EMPLOYEE (EMPNO CHAR(6) NOT NULL, FIRSTNME VARCHAR(12) NOT NULL, MIDINIT CHAR(1) NOT NULL LASTNAME VARCHAR(15) NOT NULL, WORKDEPT CHAR(3), PHONENO CHAR(4), HIREDATE DATE, JOB CHAR(8), EDLEVEL SMALLINT NOT NULL, SEX CHAR(2), BIRTHDATE DATE, SALARY DECIMAL(9,2), BONUS DECIMAL(9,2), COMM DECIMAL(9,2), PRIMARY KEY (EMPNO), FOREIGN KEY EMPFKEY (WORKDEPT) REFERENCES DEPARTMENT ON DELETE SET NULL) IN PUBLIC.SAMPLE

This example creates a table called EMPLOYEE, which has 14 columns, by a creator with the ID JOHN. The table uses the column EMPNO as the primary key, and the column WORKDEPT as a foreign key called EMPFKEY, which references WORKDEPT in the DEPARTMENT table. The delete rule is SET NULL, and the table resides in the "PUBLIC".SAMPLE dbspace.

Naming Tables

A table name can be up to 18 characters long (18 bytes). Table names that are not explicitly qualified by the creator name in the CREATE TABLE statement are qualified by the database manager. For example, assume that a user with an ID of SMITH is entering SQL statements interactively. If SMITH creates a table named ABC, with no qualifier, the table name becomes SMITH.ABC. SMITH can own only one table, view, or synonym called ABC. A different user ID, JONES, can create another table, view, or synonym called ABC, which will become JONES.ABC.

If the DBCS option is enabled, you can use DBCS characters in table names (the 18-byte length restriction still applies). Enabling the DBCS option is discussed in the DB2 Server for VM System Administration and DB2 Server for VSE System Administration manuals.

Choosing Columns

You implement your database design primarily by choosing the columns that make up each table. Almost inevitably, there is some conflict between the theoretical design and the most practical implementation, as described in the following sections.

Considerations for Normalization

In Step 7: Normalize Your Tables normalization was discussed only from the viewpoint of logical database design, without considering performance. Consider the example there of the column that contains the addresses of warehouses. The column is first shown as part of a table that contains information about parts and warehouses; then, to further normalize the design, it is removed from that table and defined as part of a table that contains information only about warehouses. The other possible design (in which the column is part of both tables) was not considered.

Some applications might require information about both parts and warehouses, including the addresses of warehouses. With normalization, information can be retrieved by joining tables. The problem is that a join operation can be very time-consuming, even for only two tables, and as the number of tables increases the access costs can increase enormously, depending on the size of the tables and the available indexes. If indexes are not available, the join of many large tables can take hours. Furthermore, the number of tables that can be joined is at most 15 and, depending on the complexity of the statement, can be significantly less. Thus, an unnormalized design may be absolutely necessary.

Consider making both tables have a column that contains the addresses of warehouses. If this design makes join operations unnecessary, it could be a worthwhile redundancy. Warehouse addresses do not change often, and if one does change, DB2 Server for VSE & VM makes it easy to update all occurrences.

Considerations for Row Size

Rows are stored within pages. A single row cannot occupy more than one page, and you cannot create a table with a maximum row size that is greater than the page size. One exception is that columns of type LONG VARCHAR or LONG VARGRAPHIC can be longer than one page; therefore, the rows that contain them can occupy more than one page. There is no other absolute limit, but if you ignore row size in favor of implementing a good theoretical design, you may waste storage.

Row Length--Fixed or Varying

Table rows may be of fixed or varying lengths. Two considerations apply:

The presence of any columns with varying-length data types will result in a varying-length row.
If the rightmost columns of the row are defined as allowing nulls, and if no values for those columns are supplied when a row is inserted, storage is not allocated for those columns. If those columns in the inserted row are subsequently updated, the row length will be increased to accommodate the non-null column values.

The disadvantage of varying-length rows is that if the row length is increased, the row may have to be repositioned. If the row is repositioned and there is not enough free space on the current page to accommodate the row, then the row will be moved to another page. In this case, whenever that moved row is accessed, an additional page reference is required.

Row Lengths and Pages

Along with the bytes of actual data, each row has:

A 6-byte prefix
A 2-byte slot for each row stored in the page
1 additional byte for each column that may contain null values
1 additional byte for each varying-length column.

In addition, every data page has a 16-byte header.

This overhead affects the amount of data that can be stored on each page in your dbspace. In designing your table, consider your design needs while looking for ways to store your data as efficiently as possible.

Some Space-Wasting Designs

Space is wasted in a dbspace if all its rows are slightly longer than half a page, because then only one row can fit in each page. If you can reduce the row lengths to just under half a page, you will need only half as many pages. Similar considerations apply to rows that are just over a third of a page, a quarter of a page, and so on.

It is particularly important to minimize the number of pages in a dbspace because if an index is not used, the database manager will read every active page of the dbspace.

For example, suppose you design a table to hold a large array of floating-point numbers. If you define each column as FLOAT and use the maximum number of columns (255), the row length is 2048 and only one row fits on each page. If you use 240 columns, two rows could fit on each page, and a page would contain 480 floating-point numbers, rather than only 255.

Specifying Columns

A column contains all occurrences of one of the entities in a table. (You can think of it as a field in a row.) In Figure 17, the lines immediately following the table name contain the names of the columns within the table. In the sample EMPLOYEE table, the HIREDATE column contains all the hire dates for all employees represented by EMPNO. You cannot redefine or overlap columns and, after you have implemented the design of your database, you usually cannot change a column definition without disrupting applications. Therefore, consider carefully the decisions you make about column definitions. (However, you can add columns to an existing table. See Altering the Design of a Table.)

For each column, you must specify a name and a data type.

For each column, you may specify:

A length (of values in the column, not the number of values) and whether null values are permitted. For a column containing character data, you can also specify the subtype. For further information, refer to Specifying SBCS, Mixed, or Bit Subtypes.
A CCSID for a column with character or graphic data, if you want to override the default CCSID. For further information, refer to Specifying a CCSID.
Whether you plan to run a user-written exit routine whenever a program enters or retrieves data in the column. This type of routine, called a field procedure, can be used, for example, to alter the sorting sequence of values entered in the column. Field procedures are assigned to specific columns when the table is created or altered. For further information on field procedures, see Specifying a FIELDPROC.

Column Names

Column names must be unique within a table, but you can use the same name in different tables. The maximum length is 18 bytes.

If the DBCS option is enabled, you can use DBCS characters in the column names. See the DB2 Server for VM System Administration or DB2 Server for VSE System Administration manual.

Nulls

As mentioned under Step 3: Provide Column Definitions for Tables, some columns cannot have a meaningful value in every row.

A special value indicator, called the null value, represents an unknown or missing value. It should not be confused with a zero value, a blank, or an empty string: it is a special value interpreted by the database manager to mean that no data has been supplied.

Unless you specify otherwise, any column you define can contain null values, and rows can be created in the table without providing a value for the column. Avoid using nulls for columns that will be used as indexes. To disallow null values, use the NOT NULL clause, and provide a non-null value for that column whenever you store data. Columns that will be referenced in a primary key or unique constraint must be defined as NOT NULL.

If you add a column to an existing table, it contains no data and so cannot be defined as NOT NULL.

In the example in Figure 17, nulls are acceptable for certain columns and prohibited for others.

Before you decide whether to allow nulls for unknown values in a column, be aware of how nulls can affect the result of a query.

Nulls in predicates
Nulls do not satisfy any condition in an SQL statement other than the special NULL predicate. Null values do not act like other values. For instance, if you try to determine whether a null value is larger or smaller than a given known value, you get an answer of UNKNOWN.
Nulls in quantified predicates
If either the left side or the subselect list of a quantified predicate is null, the quantified predicate is residual. Residual predicates require more processing because of the communication between the Relational Data System (RDS) and the Database Storage Subsystem (DBSS). Predicate processing is described in the DB2 Server for VSE & VM Performance Tuning Handbook manual.
Nulls with Field Procedures

If you allow nulls in a column with a field procedure, that field procedure is not invoked when you access a null value: the database manager returns the null value.

Specifying Data Types

You must give a data type for each column of a table, to specify the type of data the column will contain and the length of the data field.

The first thing you must decide when defining a column is what kind of data the column will contain--string, numeric, or date/time. The decision is often obvious because only a string column can contain letters or special characters. If the data consists solely of digits, however, you have to decide whether to specify it as string or numeric data. And if the values represent dates, times, or timestamps, you will want to consider the data types DATE, TIME, and TIMESTAMP.

Numeric Data Types

The data types for numbers are shown in Table 6.

Table 6. Numeric Data Types

Data Type Denotes a column of...
SMALLINT Small integers. A small integer is an IBM System/370* halfword signed binary integer of 16 bits; the range is -32,768 to +32,767.

INTEGER or INT Large integers. A large integer is an IBM System/370 fullword signed binary integer of 32 bits; the range is -2,147,483,648 to +2,147,483,647.
REAL or FLOAT(n) Single precision floating-point numbers. n must be in the range 1 through 21. There is no default; if you omit n when declaring a data type of FLOAT, the column has double precision. A single precision floating-point number is an IBM System/370 short floating-point number of 32 bits.
FLOAT, FLOAT(n), or DOUBLE PRECISION Double precision floating-point numbers. n must be in the range 22 through 53; its default is 53. A double precision floating-point number is an IBM System/370 long floating-point number of 64 bits. The range of magnitudes for floating-point numbers of either type is approximately ±5.4E-79 to ±7.2E+75.
DECIMAL(p,s), DEC(p,s) or NUMERIC (p,s) IBM System/370 packed decimal numbers with precision p and scale s. The precision p, which is the total number of digits, must be greater than 0 and less than 32. The scale s, which is the number of digits in the fractional part of the number, must be greater than or equal to 0 and less than or equal to the precision. s may be omitted; its default is 0. And if s is omitted, p may also be omitted; its default is 5. The range of decimal values is 31 digits, and these values can be positive or negative. NUMERIC and DEC are synonymous with DECIMAL.

Data Type	Denotes a column of...
SMALLINT	Small integers. A small integer is an IBM System/370* halfword signed binary integer of 16 bits; the range is -32,768 to +32,767.
INTEGER or INT	Large integers. A large integer is an IBM System/370 fullword signed binary integer of 32 bits; the range is -2,147,483,648 to +2,147,483,647.
REAL or FLOAT(`n`)	Single precision floating-point numbers. `n` must be in the range 1 through 21. There is no default; if you omit `n` when declaring a data type of FLOAT, the column has double precision. A single precision floating-point number is an IBM System/370 short floating-point number of 32 bits.
FLOAT, FLOAT(`n`), or DOUBLE PRECISION	Double precision floating-point numbers. `n` must be in the range 22 through 53; its default is 53. A double precision floating-point number is an IBM System/370 long floating-point number of 64 bits. The range of magnitudes for floating-point numbers of either type is approximately ±5.4E-79 to ±7.2E+75.
DECIMAL(`p,s`), DEC(`p,s`) or NUMERIC (`p,s`)	IBM System/370 packed decimal numbers with precision `p` and scale `s.` The precision `p,` which is the total number of digits, must be greater than 0 and less than 32. The scale `s`, which is the number of digits in the fractional part of the number, must be greater than or equal to 0 and less than or equal to the precision. `s` may be omitted; its default is 0. And if `s` is omitted, `p` may also be omitted; its default is 5. The range of decimal values is 31 digits, and these values can be positive or negative. NUMERIC and DEC are synonymous with DECIMAL.

For integer values, SMALLINT or INTEGER (depending on the range of the values) are preferable to DECIMAL or FLOAT.

For real numbers with a small precision and scale, DECIMAL is preferable to FLOAT.

For numeric data, use numeric rather than string columns for the following reasons:

They require less space.
They permit arithmetic operations.
They are accessed more efficiently. For example, if numbers are represented as strings, when the database manager calculates a range, the optimizer takes into consideration all possible bit patterns and cannot calculate an appropriate filter factor. Because of this, a much higher number of rows is returned. For further information on filter factors, refer to the DB2 Server for VSE & VM Performance Tuning Handbook manual.

String Data Types

The data types for strings are shown in Table 7.

Table 7. String Data Types

Data Type Denotes a column of...
CHAR(n) or CHARACTER(n) Fixed-length character strings with a length of n bytes. n must be greater than 0 and less than 255.
VARCHAR(n) Varying-length character strings with a maximum length of n bytes. n must be greater than 0. If n is greater than 254, certain restrictions apply to the use of the columns in SQL statements. The upper limit on the value of n is 16,383.
LONG VARCHAR Varying-length character strings with a maximum length of 32,767 bytes. The restrictions that apply to VARCHAR columns where n>254 also apply to LONG VARCHAR columns.
GRAPHIC(n) Fixed-length graphic strings containing n double-byte characters. n must be greater than 0 and less than 128.
VARGRAPHIC(n) Varying-length graphic strings. The maximum length, n, must be greater than 0. If n is greater than 127, certain restrictions apply to the use of the column in SQL statements. The upper limit on the value of n is 16,383.
LONG VARGRAPHIC Varying length graphic strings with a maximum length of 16,383 bytes. The restrictions that apply to the use of a VARGRAPHIC column where n>127 also apply to a LONG VARGRAPHIC column.

Data Type	Denotes a column of...
CHAR(`n`) or CHARACTER(`n`)	Fixed-length character strings with a length of `n` bytes. `n` must be greater than 0 and less than 255.
VARCHAR(`n`)	Varying-length character strings with a maximum length of `n` bytes. `n` must be greater than 0. If `n` is greater than 254, certain restrictions apply to the use of the columns in SQL statements. The upper limit on the value of `n` is 16,383.
LONG VARCHAR	Varying-length character strings with a maximum length of 32,767 bytes. The restrictions that apply to VARCHAR columns where n>254 also apply to LONG VARCHAR columns.
GRAPHIC(`n`)	Fixed-length graphic strings containing `n` double-byte characters. `n` must be greater than 0 and less than 128.
VARGRAPHIC(`n`)	Varying-length graphic strings. The maximum length, `n`, must be greater than 0. If `n` is greater than 127, certain restrictions apply to the use of the column in SQL statements. The upper limit on the value of `n` is 16,383.
LONG VARGRAPHIC	Varying length graphic strings with a maximum length of 16,383 bytes. The restrictions that apply to the use of a VARGRAPHIC column where n>127 also apply to a LONG VARGRAPHIC column.

If you want to use a field procedure with a column, the column must have a short string data type. You can also use string columns to specify binary (bit) data or character data for exchange with other application servers.

Choosing Fixed-Length or Varying-Length Data Types

VARCHAR saves DASD space. The saving is at the cost of a 1-byte overhead for each value and the additional processing required for varying-length rows. Thus, CHAR is preferable to VARCHAR, unless the space saved by the use of VARCHAR is significant. The saving is not significant if the maximum length is small or the lengths of the values do not have a significant variation.

If you use VARCHAR, do not specify a maximum length greater than necessary. In particular, note the restrictions on columns of strings longer than 254 bytes; for example, they cannot be indexed.

The database manager will not use index-only access to retrieve the data if the index is created on a VARCHAR column. For information on index-only access, refer to the DB2 Server for VSE & VM Performance Tuning Handbook manual.

Do not use LONG VARCHAR unless you really want the maximum row length to be as large as possible, because there is a higher cost associated with accessing long fields.

In most cases, the content of the data intended for a column dictates the data type you choose. For example, the data type selected for the department name (DEPTNAME) of the DEPARTMENT table is VARCHAR(36). Because department names normally vary considerably in length, the choice of a varying-length data type seems appropriate. Choosing a data type of CHAR(36), for example, would result in much wasted space, because all department names, regardless of their length, would be assigned the same amount of space (36 bytes).

The foregoing considerations about CHAR, VARCHAR, and LONG VARCHAR columns apply in the same way to GRAPHIC, VARGRAPHIC, and LONG VARGRAPHIC columns. The one exception is that the length (n) of a GRAPHIC or VARGRAPHIC column is given as a number of double-byte characters; hence, the length in bytes is twice n.

Specifying SBCS, Mixed, or Bit Subtypes

The use of subtypes applies only to character data such as CHAR, VARCHAR, and LONG VARCHAR. A default subtype for character columns is set at installation time. You can override this default for any column in a table when the table is created (or when a column is added to an existing table).

Choose the SBCS subtype when the data in the column is single-byte character data and the default is not.

Choosing FOR MIXED DATA lets you store (and to have the column flagged as storing) both single- and double-byte characters. The database manager ensures the integrity of valid mixed data during truncation.

For columns that contain binary data that should not be modified when moved between different environments (such as from ASCII to EBCDIC), specify FOR BIT DATA.
Note: When specifying a subtype, you are also implicitly specifying the CCSID for the subtype.

Specifying a CCSID

Default CCSID values for character and graphic data are specified during installation. To override the CCSID used for a column containing any of these data types, specify one of your own.

Each CCSID is associated with either graphic data or a specific subtype of character data. Query the SYSTEM.SYSCCSIDS system catalog table to determine the CCSID values for each of these.

If you compare data from two columns or move data between two columns having different CCSIDs, and if a conversion selection table exists, the data in one of the columns is converted to ensure a consistent comparison. Query the SYSTEM.SYSSTRINGS catalog table for a list of valid conversion selection tables. (In VM you can also look at the ARISSTR MACRO on the production minidisk for a list of valid conversion selection tables.) Consider your users' environments and needs when specifying a CCSID for a particular column. When you override the default CCSID for a column of data, you can minimize the amount of converting done on tables that are accessed primarily by users requiring different CCSIDs.
Note: Converting from one CCSID to another, then another, and then returning to the original CCSID, can result in the misinterpretation of data if there is not a one-to-one correspondence between the two sets of characters.

See the DB2 Server for VM System Administration or DB2 Server for VSE System Administration manual for more information about specifying CCSIDs.

Specifying a FIELDPROC

A field procedure (FIELDPROC) is a user-written exit routine used to encode and decode values in a character string. Field procedures can only be used on short character strings (CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC).

A field procedure can be used to alter the sorting sequences of a short character string column. It is assigned to a column during execution of the CREATE TABLE or ALTER TABLE statement, and is called whenever values in the column are changed, inserted, or retrieved. To specify that a column use a field procedure, use the FIELDPROC option followed by the program name of the procedure and, optionally, a list of parameters.

For example, to specify a field procedure for the column LASTNAME of the EMPLOYEE sample table, change one line of Figure 17 to look like this:

   LASTNAME VARCHAR(15) NOT NULL FIELDPROC MYPROG (4, 3, 7),

In the example, the name of the field procedure is chosen as MYPROG. The parameters 4, 3, and 7 are passed to the procedure when it is invoked by the CREATE TABLE or ALTER TABLE statement.

For more information about field procedures, see the DB2 Server for VM System Administration or DB2 Server for VSE System Administration manual.

Data Types for Dates, Times, and Timestamps

The data types for dates, times, and timestamps are shown in Table 8.

Table 8. Date, Time, and Timestamp Data Types

Data Type Denotes a column of...
DATE Dates. A date is a three-part value representing a year, month, and day in the range 0001-01-01 to 9999-12-31.
TIME Times. A time is a three-part value representing a time of day in hours, minutes, and seconds, in the range 00.00.00 to 24.00.00.
TIMESTAMP Timestamps. A timestamp is a seven-part value representing a date and time by year, month, day, hour, minute, second, and microsecond, in the range 0001-01-01-00.00.00.000000 to 9999-12-31-24.00.00.000000.

Data Type	Denotes a column of...
DATE	Dates. A date is a three-part value representing a year, month, and day in the range 0001-01-01 to 9999-12-31.
TIME	Times. A time is a three-part value representing a time of day in hours, minutes, and seconds, in the range 00.00.00 to 24.00.00.
TIMESTAMP	Timestamps. A timestamp is a seven-part value representing a date and time by year, month, day, hour, minute, second, and microsecond, in the range 0001-01-01-00.00.00.000000 to 9999-12-31-24.00.00.000000.

For a detailed description of Date/Time characteristics, see the DB2 Server for VSE & VM SQL Reference manual.

Advantages of Date/Time Data Types

Numbers representing dates and times can, of course, be stored in columns with numeric data types; if they include special characters as separators, they can be stored in string columns. But neither of these options provides the advantages of the DATE, TIME, and TIMESTAMP data types, as described below.

Variable Input and Output Format

Date/time values are stored in a special internal format, which is freely convertible on output or input to or from any of the formats in Table 9.

Table 9. Date Formats

Format Name Abbreviation Typical Date Typical Time
International Standards Organization ISO 1992-12-25 13.30.05
IBM USA standard USA 12/25/1992 1:30 PM
IBM European standard EUR 25.12.1992 13.30.05
Japanese Industrial Standard (Christian Era) JIS 1992-12-25 13:30:05

Format Name	Abbreviation	Typical Date	Typical Time
International Standards Organization	ISO	1992-12-25	13.30.05
IBM USA standard	USA	12/25/1992	1:30 PM
IBM European standard	EUR	25.12.1992	13.30.05
Japanese Industrial Standard (Christian Era)	JIS	1992-12-25	13:30:05

You also have the option of supplying an exit routine to make conversions to and from any local standard. For instructions about writing and using a date or time exit routine, see the DB2 Server for VM System Administration or DB2 Server for VSE System Administration manual.

When loading date or time values from an outside source, the database manager accepts any of these formats, and convert valid input values to the internal format. For retrieval, there is a default format that you select at the time of installation. You can change the default at any time by updating the SYSOPTIONS catalog; you can override it for every statement in a program by a precompiler option, or for particular instances by the CHAR scalar function. For example, whatever your local default, the following statement displays employees' birth dates in IBM USA standard form:

   SELECT EMPNO, CHAR(BIRTHDATE, USA) FROM EMPLOYEE

Date/Time Arithmetic and Durations

Date/time arithmetic involves intervals of time that are represented by numbers called durations. A duration is an interpretation of a number, not a data type.

A labeled duration is any number of years, months, days, hours, minutes, seconds, or microseconds. A date duration is a number of years, months, and days. A time duration is a number of hours, minutes, or seconds. A timestamp duration is a number of years, months, days, hours, minutes, seconds, and microseconds. For a further discussion of durations, see Date/Time Arithmetic, or the DB2 Server for VSE & VM SQL Reference manual.

The only arithmetic operators that can be applied to date/time values are addition and subtraction. If a date/time value is the operand of addition, the other operand must be a duration.

For example, the following statement lists employees who have been hired after the age of 40:

   SELECT * FROM EMPLOYEE
     WHERE HIREDATE > BIRTHDATE + 40 YEARS

This statement lists employees who have been hired in the last 3 months:

   SELECT * FROM EMPLOYEE
     WHERE HIREDATE > CURRENT DATE - 3 MONTHS

Date/Time Functions

There are functions to extract the years, months, days, hours, minutes, seconds, and microseconds of dates, times, and timestamps. For example, this statement lists all employees who have a service anniversary on June 21:

   SELECT * FROM EMPLOYEE
     WHERE MONTH(HIREDATE) = 6 AND DAY(HIREDATE) = 21

There are also functions to convert dates, times, and timestamps to character or integer representations.

String Representations of Date/Time Values

In the following example, 07/28/1971 is interpreted as a date because it is compared to a date; in other contexts (a SELECT list, for example) 07/28/1971 is merely a character string.

   SELECT * FROM EMPLOYEE
     WHERE HIREDATE = '07/28/1971'

Date/Time Comparisons

All comparison operators are allowed. The statement below lists all employees hired after October 31, 1979. To show another of the recognized date formats, we have arbitrarily chosen to write the date in the IBM European standard.

   SELECT * FROM EMPLOYEE
     WHERE HIREDATE > '31.10.1979'

Comparing Data Types

You can compare values of different types and lengths provided that both values are numeric, both are character strings, or both are graphic strings.

Date and time comparisons cannot be made with values of different types: a date can be compared only with a date, a time with a time, and a timestamp with a timestamp (or, in each case, with a valid string representation of a date, time, or timestamp).

If a column uses a field procedure, values to be compared to it are first encoded by the field procedure. If a column with a field procedure is compared to another column, both columns must have the same field procedure and data type.

Columns do not have to have the same CCSID to be compared. When two columns with differing CCSIDs are compared, and a conversion selection table exists, the data in one of the columns is converted to ensure a consistent comparison. For further information, refer to the DB2 Server for VSE & VM SQL Reference manual.

Specifying a PRIMARY KEY

The primary key of a table, if one has been created, consists of one or more columns that uniquely identify each row in the table. In the example in Figure 17, the employee number is the primary key of the employee table, and the PRIMARY KEY clause identifies the column of employee numbers (EMPNO).

A table that is to be a parent of dependent tables must have a primary key--the foreign keys of the dependent tables refer to it. Otherwise, a primary key is optional. If you are defining referential constraints, read Considerations for Referential Integrity when Creating Tables before creating or altering any of the tables involved.

If you specify a primary key, a unique index is automatically defined on the same set of columns, in the same order as those columns. The primary key values must then be unique and cannot be null. Their uniqueness cannot depend upon trailing blanks in columns containing VARCHAR or VARGRAPHIC data. Automatic enforcement of these restrictions can be useful even if the table is not involved in referential constraints. If each row of your table does relate to a unique occurrence of some entity, then consider creating a primary key.

If the primary key is created on a VARCHAR or VARGRAPHIC column, index-only access is not used to retrieve the data. For information on index-only access, refer to the DB2 Server for VSE & VM Performance Tuning Handbook manual.

Specifying a UNIQUE Constraint

The unique constraints on a table ensure the uniqueness of values in columns making up each constraint. Although functionally similar to a unique index, a unique constraint can be defined when the table is created, deactivated, and then reactivated to enforce the uniqueness of values in its key. This simplifies administration when you load data or perform operations that could temporarily violate the unique constraint. For this reason, unique constraints are preferable to unique indexes, which must be individually and explicitly dropped and recreated to suspend or enforce uniqueness.

A unique constraint is also similar to a primary key in that:

It consists of one or more columns
The columns are not nullable
The database manager enforces uniqueness by creating a unique index.

It differs from a primary key in that:

It cannot be referenced by a foreign key
You can define more than one on any table
It can be given a name.

Considerations in Defining Unique Constraints

The columns in a unique constraint cannot allow null values.
You cannot duplicate a unique constraint on a table.
The columns of a unique constraint should not be the same as columns in a primary key. The converse is also true.
A unique constraint can be added after the table is created through the ALTER TABLE statement.
Like primary keys and unique indexes, the uniqueness of values in a unique constraint cannot depend upon trailing blanks in columns with VARCHAR or VARGRAPHIC data.

Considerations for Referential Integrity when Creating Tables

For any table, you can define one primary key using the primary key clause, and any number of foreign keys using the referential constraint clauses. In a referential constraint, the table that has the foreign key definition is the dependent table and the table that is referenced by the foreign key is the parent.

The constraint-name identifies the key being specified.

It is optional.

The database manager generates a constraint-name if one is not provided; however, you should create your own for foreign keys. Constraint-names should be symbolic and indicate the parent and foreign key names, which will make working with the keys much easier. Working with keys is discussed in Altering Referential and Unique Constraints.

A referential constraint is defined by creating or altering tables to have a parent/dependent relationship between them. A referential constraint can span dbspaces. A referential structure is a set of tables that are related to each other by referential constraints. A dbspace may have more than one referential structure but that is generally not desirable.

Primary Key Index

When a primary key is defined, a unique index is created automatically to enforce its uniqueness. If you have not specified information such as index order and percent free space on the key definition, the index is created using default values.

When a primary key is defined by the CREATE TABLE statement, the CLUSTERING index is the one associated with the primary key. If you want to have this index on columns other than those comprising the primary key, create the table without a primary key, then create an index on the desired columns, and then use the ALTER TABLE statement to add the primary key.

If the primary key is dropped, either implicitly (when the table or dbspace is dropped) or explicitly (with the ALTER TABLE statement), the system-generated index is automatically dropped. You cannot use the DROP INDEX statement to explicitly drop an index that was created to support a primary key.

Use the ALTER TABLE ACTIVATE PRIMARY KEY statement to reorganize the primary key index if the primary key is active, or to recreate the index if the primary key is inactive. For more information about this statement, see Altering the Design of a Table.

Usage Notes

The primary key columns must not allow null values, and the primary key clause must not be used more than once.
Corresponding columns in primary and foreign keys of the same referential constraint must have the same data type.
The columns in a key must exist in the table, and may not be used more than once.
If the same referential constraint is defined more than once, a warning is issued, and a new foreign key is added.
The parent table referenced by a foreign key must already exist. It must not be a view, and it must have an active primary key.
The delete rule, if specified, must be one of RESTRICT, SET NULL, or CASCADE.
IF SET NULL is used, at least one foreign key column must be nullable.
When defining foreign keys, you must have REFERENCES privilege on the parent table and ALTER privilege on the dependent table.
When defining referential constraints, if a primary key has a field procedure, then the foreign key must have the same field procedure.

Restrictions on Keys and Referential Constraints:

Keys cannot be added to or dropped from the system catalog tables, and a system catalog cannot be referenced in any referential constraint.
No table in a referential cycle with two or more tables may be delete-connected to itself. This ensures that the result of a delete from a table does not depend upon the sequence when the database manager accesses the table. In a referential cycle of two tables, neither delete rule can be CASCADE. For a referential cycle of more than two tables, two or more delete rules must not be CASCADE.
A table is delete-connected to another table if deletion of rows from one table affects the other table. The implications are:
- A dependent table is always delete-connected to its parents, whatever the delete rule is.
- A descendent table is delete-connected to a table higher than it in the hierarchy if a delete of rows in the higher-level table can cause a delete of rows in the descendent's parent table.
For a descendent table to be delete-connected to the same higher-level table through more than one path, all delete rules on each path must be CASCADE, except possibly the delete rule between the descendent and its immediate parent on each path. The delete rules of the descendent with its parent table on each path must be the same and must not be SET NULL. This ensures that the order in which the delete rules are applied has no effect on the result of an operation.
For further information on tables that are delete-connected through multiple paths, refer to the DB2 Server for VSE & VM SQL Reference manual.
Self-referencing tables are not supported.

For further information on referential integrity, refer to Elements of Referential Integrity.

Integrity Rules for DELETE

There are no rules for the deletion of rows from dependent tables. The deletion rule specified in the referential constraint clause defines what action should be taken by the database manager when a row in the parent table is to be deleted. See DELETE Rules.

Integrity Rules for INSERT

Insert rules always apply when primary and foreign keys are defined. See INSERT Rules.

Integrity Rules for UPDATE

Update rules always apply when primary and foreign keys are defined. See UPDATE Rules.
Note: If a table is a parent in one relationship and a dependent in another, integrity rules for DELETE, INSERT, or UPDATE must be satisfied for both relationships.

To determine the delete rule of an existing foreign key, access the SYSKEYS catalog table as follows:

   SELECT KEYTYPE, KEYNAME, DELETERULE FROM SYSTEM.SYSKEYS
     WHERE TNAME='table-name'

Placing Tables in Dbspaces

When creating a table, you can specify the dbspace in which it is to reside. If you do not, it is put in the creator's PRIVATE dbspace. If the creator does not have a PRIVATE dbspace, then the CREATE TABLE statement fails.

If you specify the name of the dbspace but not the name of the owner, the database manager searches for a PRIVATE dbspace of the specified name that is owned by the creator of the table. If this does not exist, the database manager then looks for a PUBLIC dbspace with the specified name. If that does not exist, then the CREATE TABLE statement fails. Refer to the DB2 Server for VSE & VM SQL Reference manual for more information about the CREATE TABLE statement.

Table placement under the various possible default conditions is illustrated in Figure 18.

Figure 18. Default Placement of Tables in Dbspaces

View figure.

Notes for Figure 18:

A user with DBA authority can create tables for any user in any dbspace.
Users with RESOURCE authority can create tables for themselves only, and then only in their own dbspaces or in any PUBLIC dbspaces.
If the dbspace name is specified but not qualified (just XX), the database manager first looks for a PRIVATE dbspace owned by the creator. If this is not found, then the database manager looks for PUBLIC.XX.
If the dbspace is defaulted, the required default PRIVATE DBSPACE (CC.ZZ, DD.YY, or RR.SS) must exist.
If you omit the dbspace name, the database manager will not select a dbspace that resides in a nonrecoverable storage pool. If you want to create a table in a nonrecoverable dbspace, you must specify the dbspace name.

You can easily avoid confusion by fully qualifying both the table name and the dbspace name.

Creating Views

Some of your users may find that no single table contains all the data they need; rather, the data might be scattered among several tables. Or one table might contain more data than they want to see or are authorized to see. For those situations, you can create views. A view is an alternative way of describing data that exists in one or more tables.

You can create a view any time after creating the underlying tables. The owner of a set of tables implicitly has the authority to create a view on them, and someone with DBA authority can create a view for any owner on any set of tables.

Use the CREATE VIEW statement to define a view and give it a name. Unless you specifically list different column names after the view name, the column names of the view will be the same as those of the underlying table. ( Table 11 shows an example of this.) When creating different column names for your view, remember the naming conventions you established when designing the database.

As Table 11 illustrates, the information in the view is described by a SELECT statement. This statement can name other views as well as tables, and can use WHERE, WITH CHECK OPTION, GROUP BY, and HAVING clauses. It cannot use ORDER BY, name a host variable, or contain the UNION operator.

By specifying a WHERE clause in the subquery of a view definition, you can limit the rows addressed through a view. If an application (or user) deals with a specific set of rows in a table, you can create a view to limit the rows addressed to only those required. If a view is created using the WHERE and WITH CHECK OPTION clauses, all subsequent UPDATEs and INSERTs will prevent changes to rows that fall outside the set of rows defined by the view. Refer to the DB2 Server for VSE & VM SQL Reference manual for more information about creating views.

Reasons for Using Views

Some reasons you might want to use views are:

To provide a customized table for a specific user
Some tables may have a large number of columns, not all of which are of interest to all users or are named or ordered appropriately. You can, in effect, create a smaller table for certain users by defining a view that contains only the columns of interest. You can rename columns and reorder the column sequence to tailor the view to the user's needs.
To limit access to certain kinds of data
You can create a view containing only selected columns and rows from a table or tables. Users with the SELECT privilege on the view see only the information you describe. For example, a view could be defined that showed only the FIRSTNME, LASTNAME, WORKDEPT, and EDLEVEL columns for employees in Department D11.
To alter tables without affecting application programs
For example, a program that uses an INSERT into T1 without a specified list of column names will cause an error after you add a column to table T1. The error is generated because the number of values being inserted into the table is different than the number of columns in the table. If T1 is a view, you will be protected from that error because adding a column to the table does not affect the view definition and, therefore, does not affect the program.

Creating a View on a Table

The example below illustrates creating a view on a single table, the DEPARTMENT table. Of the four columns in the table, only three are required for the view: DEPTNO, DEPTNAME, and MGRNO. The order of the columns in the SELECT clause is the order in which they appear in the view.

   CREATE VIEW VDEPT3 AS
     SELECT DEPTNO,DEPTNAME,MGRNO
     FROM DEPARTMENT

In this example, no column list follows the view name, VDEPT3. Hence, the columns of the view have the same names as those of the table on which it is based (DEPTNO, DEPTNAME, MGRNO). Table 10 shows the result of executing the following SQL statement:

   SELECT * FROM VDEPT3

Table 10. View of a Table

DEPTNO DEPTNAME MGRNO
A00 SPIFFY COMPUTER SERVICE DIV. 000010
B01 PLANNING 000020
C01 INFORMATION CENTER 000030
D01 DEVELOPMENT CENTER ?
D11 MANUFACTURING SYSTEMS 000060
D21 ADMINISTRATION SYSTEMS 000070
E01 SUPPORT SERVICES 000050
E11 OPERATIONS 000090
E21 SOFTWARE SUPPORT 000100

DEPTNO	DEPTNAME	MGRNO
A00	SPIFFY COMPUTER SERVICE DIV.	000010
B01	PLANNING	000020
C01	INFORMATION CENTER	000030
D01	DEVELOPMENT CENTER	?
D11	MANUFACTURING SYSTEMS	000060
D21	ADMINISTRATION SYSTEMS	000070
E01	SUPPORT SERVICES	000050
E11	OPERATIONS	000090
E21	SOFTWARE SUPPORT	000100

Creating a View from Several Tables

Name more than one table in the FROM clause to create a view that combines information from two or more tables. This operation is called a join, and is shown in the following example, which includes the manager's name (from the EMPLOYEE table) and information from the DEPARTMENT table.

   CREATE VIEW SMITH.VDEPTM  AS
     SELECT DEPTNO, MGRNO, LASTNAME, ADMRDEPT
     FROM DEPARTMENT, EMPLOYEE
     WHERE EMPLOYEE.EMPNO = DEPARTMENT.MGRNO

Table 11 shows the result of executing the following SQL statement:

   SELECT * FROM SMITH.VDEPTM

Table 11. View of Two Tables

DEPTNO MGRNO LASTNAME ADMRDEPT
A00 000010 HAAS A00
B01 000020 THOMPSON A00
C01 000030 KWAN A00
D11 000060 STERN D01
D21 000070 PULASKI D01
E01 000050 GEYER A00
E11 000090 HENDERSON E01
E21 000100 SPENSER E01

DEPTNO	MGRNO	LASTNAME	ADMRDEPT
A00	000010	HAAS	A00
B01	000020	THOMPSON	A00
C01	000030	KWAN	A00
D11	000060	STERN	D01
D21	000070	PULASKI	D01
E01	000050	GEYER	A00
E11	000090	HENDERSON	E01
E21	000100	SPENSER	E01

Now, suppose you want to create a similar view that includes only the departments that report administratively to Department A00. Suppose also that you want a different set of column names. The appropriate CREATE statement is as follows:

   CREATE VIEW SMITH.VDEPTMA00
     (DEPT, MGR, NAME, REPORTTO)
     AS
     SELECT DEPTNO, MGRNO, LASTNAME, ADMRDEPT
     FROM EMPLOYEE, DEPARTMENT
     WHERE EMPLOYEE.EMPNO = DEPARTMENT.MGRNO
     AND ADMRDEPT = 'A00'

Table 12 shows the result of executing the following SQL statement:

   SELECT * FROM SMITH.VDEPTMA00

Table 12. View Created with New Column Names

DEPT MGR NAME REPORTTO
A00 000010 HAAS A00
B01 000020 THOMPSON A00
C01 000030 KWAN A00
E01 000050 GEYER A00

DEPT	MGR	NAME	REPORTTO
A00	000010	HAAS	A00
B01	000020	THOMPSON	A00
C01	000030	KWAN	A00
E01	000050	GEYER	A00

Things You Cannot Do with a View

When designing views, consider the following restrictions:

You cannot update, insert, or delete through a view if it involves any of the following:
- SQL column functions (SUM, MAX, MIN, AVG, COUNT)
- Elimination of duplicate rows (DISTINCT)
- Grouping (GROUP BY), or HAVING clause
- A FROM clause that uses more than one table (that is, a join).
In the above cases, you can retrieve data from the views by means of the SQL SELECT statement, but you cannot use INSERT, UPDATE, or DELETE statements.
You cannot insert a row through a view if the view has a column derived from an arithmetic or string expression, a scalar function, or a constant.
You cannot update a column of a view that is derived from an arithmetic or string expression, a scalar function, or a constant (for example, a column that is defined as 1.6 x SALARY).

For more detailed information about view restrictions, see the DB2 Server for VSE & VM SQL Reference manual.

You can make changes to a table through a view when the view does not contain the same number of columns or the same number of rows as the table on which it is based. Table 13 summarizes the restrictions on accessing views.

Table 13. Restrictions on View Access

STATEMENT RESTRICTIONS
UPDATE You cannot update a view defined as the join of multiple tables. This includes views defined on views defined as the join of multiple tables.

You cannot update a view that is defined using DISTINCT, GROUP BY, or column functions.
You cannot update rows of a view that is defined using WITH CHECK OPTION, if the updated rows fall outside the set of rows defined by the view.
You cannot update virtual columns. (A virtual column is a column on a view that is not derived directly from a column of a stored table. For example, view columns defined by expressions such as MAX(SALARY), SALARY+BONUS, or AVG(PRSTAFF) are all virtual columns).
INSERT You cannot insert into a view defined as the join of multiple tables. This includes views defined on views defined as the join of multiple tables.

You cannot insert into a view that contains a column of the underlying table which allows nulls.
You cannot insert into a view that is defined using DISTINCT, GROUP BY, or column functions.
You cannot insert into a view that is defined using WITH CHECK OPTION, if the inserted rows fall outside the set of rows defined by the view.
You cannot insert into the virtual columns of a view.
DELETE You cannot delete from a view defined as the join of multiple tables. This includes views defined on views defined as the join of multiple tables.

You cannot delete from a view that is defined using DISTINCT, GROUP BY, column functions.
INDEX You cannot create an index on a view.
ALTER You cannot alter a view (for example, add columns, or keys).
DBS UNLOAD Unloading a view will sequence the unloaded rows in an arbitrary order chosen by the database manager. Rows may not be in sequence of any index on an underlying table.
DBS RELOAD

RELOADing through a view will not drop and recreate indexes on the underlying table. You must do this yourself using SQL statements that precede and follow the RELOAD statement.
The INSERT restrictions shown above also apply.
DBS DATALOAD, ISQL INPUT, SQL PUT The INSERT restrictions shown above also apply.

STATEMENT	RESTRICTIONS
UPDATE	You cannot update a view defined as the join of multiple tables. This includes views defined on views defined as the join of multiple tables. You cannot update a view that is defined using DISTINCT, GROUP BY, or column functions. You cannot update rows of a view that is defined using WITH CHECK OPTION, if the updated rows fall outside the set of rows defined by the view. You cannot update virtual columns. (A virtual column is a column on a view that is not derived directly from a column of a stored table. For example, view columns defined by expressions such as MAX(SALARY), SALARY+BONUS, or AVG(PRSTAFF) are all virtual columns).
INSERT	You cannot insert into a view defined as the join of multiple tables. This includes views defined on views defined as the join of multiple tables. You cannot insert into a view that contains a column of the underlying table which allows nulls. You cannot insert into a view that is defined using DISTINCT, GROUP BY, or column functions. You cannot insert into a view that is defined using WITH CHECK OPTION, if the inserted rows fall outside the set of rows defined by the view. You cannot insert into the virtual columns of a view.
DELETE	You cannot delete from a view defined as the join of multiple tables. This includes views defined on views defined as the join of multiple tables. You cannot delete from a view that is defined using DISTINCT, GROUP BY, column functions.
INDEX	You cannot create an index on a view.
ALTER	You cannot alter a view (for example, add columns, or keys).
DBS UNLOAD	Unloading a view will sequence the unloaded rows in an arbitrary order chosen by the database manager. Rows may not be in sequence of any index on an underlying table.
DBS RELOAD	RELOADing through a view will not drop and recreate indexes on the underlying table. You must do this yourself using SQL statements that precede and follow the RELOAD statement. The INSERT restrictions shown above also apply.
DBS DATALOAD, ISQL INPUT, SQL PUT	The INSERT restrictions shown above also apply.

Materializing a View

When designing views, you should be aware of the view-processing techniques used by the database manager and the circumstances in which each is used. Two view-processing techniques are used: view merge and view materialization. This section describes the circumstances in which a view is materialized.

When a view is referenced in an SQL statement, the view definition is merged with the SQL statement and a new statement is created that references only base tables and columns and that contains only added or modified WHERE predicates, and an added or modified GROUP BY clause. The new statement is then processed. This process is view merge. Some view-referencing statements cannot be processed using the view merge technique. The database manager uses the view materialization technique to process these statements.

With view materialization, a temporary table is created internally and a view (that could not otherwise be accessed) is materialized into the table at run time. The database manager then performs the statement on the materialized view. A materialized view is read-only, because queries on the view are on a temporary table. Each view that is materialized in an SQL statement is materialized in a temporary dbspace.

Because view merge is more efficient than view materialization, view materialization is used only if view merge cannot be used.

A view is materialized if it is created with:

A GROUP BY or HAVING clause, and is accessed by a statement that requires the view to be joined or that specifies column functions on the view
Column functions, and is accessed by a statement that requires the view to be joined
One or more DISTINCT columns, and is accessed by a statement that requires the view to be joined
Multiple DISTINCT columns, and is accessed by a statement that specifies column functions on the view
One DISTINCT column, and is accessed by a statement that specifies multiple column functions on the view
A column defined with column functions, and that column is accessed by a statement that specifies column functions
One or more DISTINCT columns, and a DISTINCT column is accessed by a statement that includes arithmetic expressions with column functions
Multiple DISTINCT columns, and is accessed by a statement that does not specify all columns in the select-list of the SELECT statement
A column defined with built-in functions, expressions, or literals, and that column is referenced in the GROUP BY or HAVING clause of a SELECT statement accessing the view
A column function with a DISTINCT specification, and is accessed by a SELECT statement with a DISTINCT specification
A column defined with a column function, and that column is referenced in the WHERE clause of a statement accessing the view
A column derived from an expression, function or constant, and that column is accessed by a statement containing a WHERE clause with a LIKE predicate
A virtual column, and that column is referenced in a DISTINCT column function of statement accessing the view.

Note:

If the SELECT list of the view definition statement contains a long field, the view cannot be materialized because of long-field restrictions.

For information on determining if view materialization occurs, refer to the DB2 Server for VSE & VM Performance Tuning Handbook manual.

Creating Indexes

The purpose of nonunique indexes is to provide efficient access to data. Unique indexes have the additional purpose of ensuring that key values are unique.

Even when present, the index is not always used: the database manager selects an access path to the data based on a combination of factors. To see whether an index is used in processing a particular SQL statement, use the EXPLAIN statement. For information on using the EXPLAIN statement and on explanation tables, refer to the DB2 Server for VSE & VM Performance Tuning Handbook manual.

Indexes can improve performance of table access; however, this is at the expense of the DASD storage required for them, and the performance of INSERT, UPDATE, and DELETE operations. Thus, while you will want to create indexes on your tables, some judgement is advised. For information about the storage required by an index, see Estimating the Number of Index Pages.

To create an index on a table, use the SQL CREATE INDEX statement. You must have DBA authority or the INDEX privilege on the table. An index may be defined on 1 to 16 columns.

Index Key

The columns identified in the CREATE INDEX statement build a key. An index key is a column or an ordered collection of columns on which an index is defined. A multicolumn key is a key built on two or more columns.

The usefulness of an index depends on its key. Columns that you use frequently in performing selection, join, projection, grouping, and ordering operations are good candidates for use as keys. See Estimating the Number of Index Pages for information on calculating the size of index keys. For columns with a field procedure, use the number of bytes in the encoded field, not the number in the decoded column.

For information about restrictions on key length, see the description of the CREATE INDEX statement in the DB2 Server for VSE & VM SQL Reference manual.

The ordering of the columns specified in the CREATE INDEX statement is important to the definition of the key sequence. The major order determinant columns must be specified first.

For example, an index on the PROJ_ACT table, defined over the PROJNO, ACTNO, and ACSTDATE columns sequences activity numbers within project numbers, and estimated activity start date within activity numbers, if the columns are specified in this order for the index.

For each column participating in the key, you can specify whether its order in the key sequence is ascending or descending. The default is ascending. When creating a unique index, the uniqueness of each value in the index key cannot depend upon trailing blanks. The database manager also ignores trailing blanks when sequencing indexes made up of VARCHAR or VARGRAPHIC values.

UNIQUE Indexes

You can enter duplicate values in a key. If you do not want duplicate values, use CREATE UNIQUE INDEX.

For example, in the sample database, it is important that there be no duplicate activity keywords in the ACTIVITY table. Creating a unique index, as in the following example, prevents duplicates.

   CREATE UNIQUE INDEX XACT1
     ON ACTIVITY (ACTKWD)

The index name is XACT1 and the indexed column is ACTKWD.

If you are planning to use referential integrity or unique constraints, described in Creating Tables, it may be unnecessary to explicitly create unique indexes. When using the primary key or unique constraint clause, the database manager automatically creates a unique index on the table. However, you may want additional indexes for other columns and foreign keys.

The PCTFREE Clause

The PCTFREE clause specifies how much space is to be reserved for future index entries, which allows index maintenance to take place without splitting of index pages. Its default is 10 percent, which is a good value for most purposes. If you expect much insert or update activity after the creation of the index, you might want to override the default by setting PCTFREE to a higher value. If you expect no insert or update activity after the creation of the index, you might want to override the default by setting PCTFREE to zero.

Usually, a low PCTFREE value, 5-10 percent, is a good choice when creating an index, as it provides enough room to accommodate a low level of maintenance. It also provides extra room at localized key ranges where high update activity is taking place by splitting a full index page into two half-empty pages when an insertion or update must go into that page.

Clustering Rows of a Table on an Index

A CLUSTERING index is used by the database manager to determine placement of rows in pages of a dbspace. The first index created on a table is, by default, the CLUSTERING index. The database manager tries to place rows with the same or similar keys on the same dbspace page.

A CLUSTERED index is an index whose sequence of key values corresponds closely to the sequence in which the table rows are actually stored in the database. It can be effectively used to minimize DASD input/output whenever the table rows are accessed in the index sequence of a CLUSTERED index. A CLUSTERING index should always be made a CLUSTERED index. This is done by loading the table rows in the key sequence of the CLUSTERING index.

To establish a CLUSTERING index that also has the property of being a CLUSTERED index, do the following:

Load the table in the index sequence (key sequence) of the CLUSTERING index.
This establishes the initial clustering of rows with similar keys. For the load operation, set PCTFREE for the dbspace to a high enough value to allow space on pages for future clustered insertion of rows.
Create the indexes on the table.
After loading the table, create the indexes on the table. The first index you create will be the CLUSTERING index. Any index having an order that matches the load sequence of the rows will be marked as a CLUSTERED index.
The CLUSTERING index will be a CLUSTERED index because you have loaded the table rows in the sequence of this index. In the SYSINDEXES catalog table, the CLUSTER column value for this index is F, indicating that it is the first index created by the table, and that it is currently a CLUSTERED index. If, after many INSERTs of new rows into the table, the order in which the rows are stored in the database no longer closely match the index key sequence, the CLUSTER column value is changed to W (the next time UPDATE STATISTICS is performed). This indicates that the index is the first index created on the table, and it is currently not a CLUSTERED index. You can reorganize the table. Refer to the DB2 Server for VSE & VM Performance Tuning Handbook manual for information on reorganizing tables. The database manager will continue to use this index to decide where new rows should be stored, because it is still the CLUSTERING index for the table.
One or more of the other indexes created on the table may also happen to have an index sequence that closely matches the sequence in which the table rows are stored. Although this is fortuitous and cannot be directly controlled by the user, the database manager will record these indexes as CLUSTERED by setting their CLUSTER column in SYSINDEXES to C. Such indexes can be exploited as efficient access paths by the database manager. When one of these indexes is no longer CLUSTERED, its CLUSTER column is changed to N the next time UPDATE STATISTICS is performed.
Reduce the PCTFREE value for the dbspace.
This is necessary to make the free space reserved during the load operation available for use on normal INSERT activity. On an INSERT or ISQL INPUT, the database manager attempts to place the inserted row on the same page as a row with the same or similar key.

You can define the key ordering of the CLUSTERING index to be any you wish. However, the primary considerations would be frequently used table orderings (that is, frequently used ORDER BYs) and joins.

If you cluster a table on an index that has a key ordering that matches the most common ORDER BY clauses for queries against the table, you can avoid internal sorting of the query results. A related consideration is the size of an ordered query result. Internal sorting of a small query result is not expensive. However, if you have a large, ordered query result (for a batch job or a comprehensive report), the internal sort could be quite time-consuming. You should consider clustering a table to support your most frequent, large sequential access orderings.

If you have a table that is frequently referenced by a join on a particular column (or set of columns), you may want to consider clustering it on an index on the join column(s). For example, between the DEPARTMENT and EMPLOYEE tables there are two likely join candidates (referential constraints are defined): one between the EMPNO column in EMPLOYEE and the MGRNO column in DEPARTMENT, and the other between the DEPTNO column in DEPARTMENT and the WORKDEPT column in EMPLOYEE. In this case you could choose to cluster both tables on either employee numbers or department numbers, depending on which join is expected more frequently.
Note: You can change the clustering that you initially define for a table. Refer to the DB2 Server for VSE & VM Performance Tuning Handbook manual for information on reorganizing tables.

Figure 19 illustrates both clustered and nonclustered indexes.

Figure 19. Clustered and Nonclustered Indexes

View figure.

Some Things to Remember When Defining Keys

Column updates require index updates. Define as few indexes as possible on a column that is updated frequently, because every change must be reflected in each index. For more information about potential problems with indexes and performance, refer to the DB2 Server for VSE & VM Performance Tuning Handbook manual.

A multicolumn key may be more useful than a key on a single column when the comparison is for equality. A single multicolumn index is more efficient when the comparison is for equality and the initial columns are provided. For example, if an index is composed of columns A, B, and C, a SELECT statement with a WHERE clause of the form WHERE A = value AND B = value may be processed more efficiently than if there are separate indexes on A and on B. Additional columns may also improve performance by allowing index-only access scanning. Refer to the DB2 Server for VSE & VM Performance Tuning Handbook manual for information on index-only access scanning.

Indexes are important tools for improving performance. Suggestions for using indexes effectively are in the DB2 Server for VSE & VM Performance Tuning Handbook manual.

An index cannot be defined over multiple tables. Furthermore, an index key cannot include any columns defined as long fields.

Avoid using VARCHAR or VARGRAPHIC columns in an index. Fixed-length indexes perform better than variable-length indexes. Data pages as well as index pages must be read when VARCHAR or VARGRAPHIC columns are included in an index. The variable-length fields have trailing blanks removed before being put into the index. This may result in the data page values differing from the index page values, and necessitates that both index and data pages be read when using the index as an access path for data retrieved.
Note: Long fields include the following data types: VARCHAR(n) with n>254, VARGRAPHIC(n) with n>127, LONG VARCHAR, or LONG VARGRAPHIC.

General Performance Considerations on the Use of Indexes

It is good practice to create a unique index on the column or set of columns that uniquely define each record in the table (its key). A unique index can easily be created by specifying a primary key or a unique constraint when you create the table. A primary key can be used as an index even if automatic referential integrity is not being used. Using a unique constraint or primary key helps data integrity because the database manager enforces this uniqueness.

Consider creating additional indexes on other columns based on how often you expect the column to be used in search criteria. Once you have identified all the desired indexes, decide which column is apt to be used most often in search criteria. Then load the table in that column's sequence, thus making the column's corresponding index a CLUSTERED index.

If the table is to have a CLUSTERING index, be sure to create that index first after initial table loading. You should do this because the database manager tries to place inserted records so that the physical sequence of the table's records is the same as the sequence defined by the first index created on that table.

It is more efficient to first load a table and then create the indexes on it, rather than the other way around.

Usually, each table should have at least one index. Part of the decision of whether to create an index on a specific column should be based on the trade-off between the faster access achieved, versus the index maintenance processing that the database manager must do whenever that column is modified. A column is an ideal candidate for being indexed if it is likely to be a frequent search argument on SQL statements, but not likely to be changed. Avoid creating indexes on frequently updated columns.

Indexes can be created and dropped. If high query activity is anticipated, temporarily create indexes on the columns that are likely access paths for those queries.

Migration Considerations for Indexes

The SQL/DS Version 2 Release 2 product introduced a new index structure for nonunique indexes. This format requires more space than in earlier releases, but it allows nonunique indexes to perform almost as well as unique ones.

The new format requires more space. The number of additional bytes required for each nonunique index in the new format is:

   4 x (number of index pages) - 4

If you are migrating from Version 2 Release 1 or earlier, some of your dbspaces may not have room for indexes in the new format. Before deleting the old indexes, determine if there are sufficient index pages available to create the index in the new format. Nonunique indexes created before Version 2 Release 2 can coexist with the new type of nonunique index, so you do not have to drop and re-create indexes.
Note: You cannot migrate from Version 2 directly to Version 7. You must migrate to Version 3 first and then migrate from Version 3 to Version 7.

Using the Catalog in Database Design

The catalog tables contain information that can be helpful in designing your database. The DB2 Server for VSE & VM SQL Reference manual lists these tables and what is stored in them.

You can also use the catalog to verify the accuracy of your database definition process. After you have created the objects in your database, display selected information from the catalog to check that there were no errors in your CREATE statements, and to verify that you have the correct tables in each dbspace.

The information in the catalog is vital to normal database system operation. As the following examples show, you can retrieve catalog information, but changing it could have serious consequences. Thus, you cannot process INSERT or DELETE statements against the catalog, and you can update only a few of the columns in selected catalog tables.

To run the following examples, you need at least the SELECT privilege on the appropriate catalog tables. Be careful with your own examples: querying the catalog can result in a long dbspace scan.

Retrieving Catalog Information about a Table

The SYSTEM.SYSCATALOG table contains a row for each table and view in your database. For each, it tells you whether the object is a table or view, its name, who created it, what dbspace contains it, and other information. It also has a REMARKS column in which you can store your own information about the table in question. See The COMMENT ON Statement for information about how to do this.

Enter the following statement to display all the information for the project activity sample table:

   SELECT *
     FROM SYSTEM.SYSCATALOG
     WHERE TNAME = 'PROJ_ACT'
     AND CREATOR = 'SQLDBA'

Retrieving Catalog Information about Columns

The SYSTEM.SYSCOLUMNS table has one row for each column of each table and view. You can query it, for example, if you cannot remember a particular column name.

The following statement retrieves information about columns in the sample department table:

   SELECT CNAME, TNAME, COLTYPE, LENGTH, NULLS
     FROM SYSTEM.SYSCOLUMNS
     WHERE TNAME='DEPARTMENT'
     AND CREATOR = 'SQLDBA'

As shown in Table 14, for each column it displays:

The column name
The name of the table that contains it
Its data type
Its length attribute
Whether or not it allows nulls.

Table 14. Retrieving Information about Columns from SYSCOLUMNS

CNAME TNAME COLTYPE LENGTH NULLS
DEPTNO DEPARTMENT CHAR 3 N
DEPTNAME DEPARTMENT VARCHAR 36 N
MGRNO DEPARTMENT CHAR 6 Y
ADMRDEPT DEPARTMENT CHAR 3 N

CNAME	TNAME	COLTYPE	LENGTH	NULLS
DEPTNO	DEPARTMENT	CHAR	3	N
DEPTNAME	DEPARTMENT	VARCHAR	36	N
MGRNO	DEPARTMENT	CHAR	6	Y
ADMRDEPT	DEPARTMENT	CHAR	3	N

Retrieving Catalog Information about Indexes

The SYSTEM.SYSINDEXES table contains a row for each index, including indexes on catalog tables.

The following query retrieves information about the index XEMPL2:

   SELECT *
     FROM SYSTEM.SYSINDEXES
     WHERE INAME = 'XEMPL2'
     AND ICREATOR = 'SQLDBA'

A table can have more than one index. The following query retrieves information about all the indexes of a table:

   SELECT *
     FROM SYSTEM.SYSINDEXES
     WHERE TNAME = 'EMPLOYEE'
     AND CREATOR = 'SQLDBA'

Retrieving Catalog Information about Views

The SYSTEM.SYSVIEWS table contains a row for each view.

The following query retrieves information about the view SYSUSERLIST:

   SELECT *
     FROM SYSTEM.SYSVIEWS
     WHERE VIEWNAME = 'SYSUSERLIST'
     AND VCREATOR = 'SQLDBA'

Retrieving Catalog Information about Authorization

The following 4 tables contain information about the privileges held over tables and views:

SYSCOLAUTH
Contains information regarding grants of the UPDATE privilege on columns of tables or views.
SYSPROGAUTH
Details privileges regarding who can run packages.
SYSTABAUTH
Contains information about the privileges held by authorization IDs and packages on tables and views.
SYSUSERAUTH
Records special privileges held by authorization IDs (for example, DBA, CONNECT authority).

Only users with DBA authority can access SYSUSERAUTH. Other users can access this information using a view called SYSUSERLIST, which contains all the columns of SYSUSERAUTH except the PASSWORD column.

Query these tables to learn who can access data in your application server. For example, the following query retrieves the names of all users who have been granted access to the SQLDBA.DEPARTMENT table, as well as any views on that table:

   SELECT GRANTEE
     FROM SYSTEM.SYSTABAUTH
     WHERE TTNAME = 'DEPARTMENT' AND GRANTEETYPE = ' '
     AND TCREATOR = 'SQLDBA'

GRANTEE is the name of the column that contains authorization IDs and package names for users of tables. TTNAME and TCREATOR specify the SQLDBA.DEPARTMENT table. The clause GRANTEETYPE = ' ' ensures that you retrieve the names only of users (not packages) that have authority to access the table.

The COMMENT ON Statement

After you create a table or view, you can provide explanatory information about it for future reference--for example, the purpose of the table, who uses it, and anything unusual about it. To do this, use the COMMENT ON statement. You can both store comments about the table or view as a whole, and include one for each column. A comment must not exceed 254 bytes.

Comments are especially useful if your names do not clearly indicate the contents of columns or tables.

Below are two examples of COMMENT ON:

   COMMENT ON TABLE SQLDBA.EMPLOYEE IS
     'Employee table. Each row in this table represents one
      employee of the company.'

   COMMENT ON COLUMN SQLDBA.PROJECT.PRSTDATE IS
     'Estimated project start date. The format is DATE.'

Retrieving Comments

When you process a COMMENT ON statement, your comments are stored in the REMARKS column of SYSTEM.SYSCATALOG or SYSTEM.SYSCOLUMNS.

Any comment already present in the row is replaced by the new one. The following queries retrieve the comments added by the two COMMENT ON statements above:

   SELECT REMARKS
     FROM SYSTEM.SYSCATALOG
     WHERE TNAME = 'EMPLOYEE'
     AND CREATOR = 'SQLDBA'

   SELECT REMARKS
     FROM SYSTEM.SYSCOLUMNS
     WHERE CNAME = 'PRSTDATE' AND TNAME = 'PROJECT'
     AND CREATOR = 'SQLDBA'

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]