Administration Guide

Creating an Index, Index Extension, or an Index Specification

An index is a list of the locations of rows, sorted by the contents of one or more specified columns. Indexes are typically used to speed up access to a table. However, they can also serve a logical data design purpose. For example, a unique index does not allow entry of duplicate values in the columns, thereby guaranteeing that no two rows of a table are the same. Indexes can also be created to specify ascending or descending order of the values in a column.

An index extension is an index object for use with indexes that have structured type or distinct type columns.

An index specification is a metadata construct. It tells the optimizer that an index exists for a data source object (table or view) referenced by a nickname. An index specification does not contain lists of row locations-it is just a description of an index. The optimizer uses the index specification to improve access to the object referenced by the nickname. When a nickname is first created, an index specification is generated if an index exists for the underlying table at the data source in a format DB2 can recognize.
Note: If needed, create index specifications on table nicknames or view nicknames where the view is over one table.

Manually create an index or an index specification when:

It would improve performance. For example, if you want to encourage the optimizer to use a particular table or nickname as the inner table of a nested loop join, create an index specification on the joining column if no index exists. See Indexing Impact on Query Optimization for more information about when you would want an index or an index specification.
An index for a base table was added after the nickname for that table was created.

Index specifications can be created when no index exists on the base table (DB2 will not check for the remote index when you issue the CREATE INDEX statement). An index specification does not enforce uniqueness of rows even when the UNIQUE keyword is specified.

The DB2 Index Advisor is a wizard that assists you in choosing an optimal set of indexes. You can access this wizard through the Control Center. The comparable utility is called db2advis.

An index is defined by columns in the base table. It can be defined by the creator of a table, or by a user who knows that certain columns require direct access. A primary index key is automatically created on the primary key, unless a user-defined index already exists.

Any number of indexes can be defined on a particular base table, and they can have a beneficial effect on the performance of queries. However, the more indexes there are, the more the database manager must modify during update, delete, and insert operations. Creating a large number of indexes for a table that receives many updates can slow down processing of requests. Therefore, use indexes only where a clear advantage for frequent access exists.

Any column that is part of an index key is limited to 255 bytes. The maximum number of columns in an index is 16. If you are indexing a typed table, the maximum number of columns is 15. The maximum length of an index key is 1024 bytes. As previously mentioned, many index keys on a table can slow down processing of requests. Similarly, large index keys can also slow down processing requests.

An index key is a column or collection of columns on which an index is defined, and determines the usefulness of an index. Although the order of the columns making up an index key does not make a difference to index key creation, it may make a difference to the optimizer when it is deciding whether or not to use an index.

If the table being indexed is empty, an index is still created, but no index entries are made until the table is loaded or rows are inserted. If the table is not empty, the database manager makes the index entries while processing the CREATE INDEX statement.

For a clustering index, new rows are inserted physically close to existing rows with similar key values. This yields a performance benefit during queries because it results in a more linear access pattern to data pages and more effective pre-fetching.

If you want a primary key index to be a clustering index, a primary key should not be specified at CREATE TABLE. Once a primary key is created, the associated index cannot be modified. Instead, perform a CREATE TABLE without a primary key clause. Then issue a CREATE INDEX statement, specifying clustering attributes. Finally, use the ALTER TABLE statement to add a primary key that corresponds to the index just created. This index will be used as the primary key index.

Generally, clustering is more effectively maintained if the clustering index is unique.

Column data which is not part of the unique index key but which is to be stored/maintained in the index is called an include column. Include columns can be specified for unique indexes only. When creating an index with include columns, only the unique key columns are sorted and considered for uniqueness. Use of include columns improves the performance of data retrieval when index access is involved.

The database manager uses a B+ tree structure for storing indexes where the bottom level consists of leaf nodes. The leaf nodes or pages are where the actual index key values are stored. When creating an index, you can enable those index leaf pages to be merged or reorganized online. Online index reorganization is used to prevent the situation where, after much delete and update activity, many leaf pages of an index have only a few index keys left on them. In such a situation, and without online reorganization, space could only be reclaimed by an off-line reorganization of the data and index. When deciding whether to create an index with the ability to reorganize index pages online, you should consider this question: Is the added performance cost of checking for space to merge each time a key deletion occurs and the actual cost to complete the merge, if there is enough space, greater than the benefit of better space utilization for the index and less than a reduced need to perform an off-line reorganization to reclaim space?
Note: Pages freed after an online reorganization merge are available for re-use only for other indexes in the same table. With a full reorganization, those pages that are freed are available to other objects (when working with Database Managed Storage) or to disk space (when working with System Managed Storage). In addition, an online reorganization will not free up any non-leaf pages of the index, whereas a full reorganization will make the index as small as possible by making the index as small as possible, reducing the non-leaf and leaf pages as well as the number of levels of the index.

See Using the CREATE INDEX Statement for more information on how to implement an index that will reorganize online.

Indexes for tables in a partitioned database are built using the same CREATE INDEX statement. They are partitioned based on the partitioning key of the table. An index on a table consists of the local indexes in that table on each node in the nodegroup. Note that unique indexes defined in a multiple partition environment must be a superset of the partitioning key.

Performance Tip: Create your indexes before using the LOAD utility if you are going to carry out the following series of tasks:

Create Table
Load Table
Create Index
Perform RUNSTATS

You should consider ordering the execution of tasks in the following way:

Create the table
Create the index
Load the table with the statistics yes option requested.

For more information on LOAD performance improvements, see System Catalog Tables.

Indexes are maintained after they are created. Subsequently, when application programs use a key value to randomly access and process rows in a table, the index based on that key value can be used to access rows directly. This is important, because the physical storage of rows in a base table is not ordered. When a row is inserted, unless there is a clustering index defined, the row is placed in the most convenient storage location that can accommodate it. When searching for rows of a table that meet a particular selection condition and the table has no indexes, the entire table is scanned. An index optimizes data retrieval without performing a lengthy sequential search.

The data for your indexes can be stored in the same table space as your table data, or in a separate table space containing index data. The table space used to store the index data is determined when the table is created (see Creating a Table in Multiple Table Spaces).

To create an index using the Control Center:

Expand the object tree until you see the Indexes folder.
Right-click the Indexes folder, and select Create --> Index Using Wizard from the pop-up menu.
Follow the steps in the wizard to complete your task.

To create an index using the command line, enter:

   CREATE INDEX <name> ON <table_name> (<column_name>)

The following two sections, Using an Index and Using the CREATE INDEX Statement, provide more information on creating an index.

Using an Index

An index is never directly used by an application program. The decision on whether to use an index and which of the potentially available indexes to use is the responsibility of the optimizer.

The best index on a table is one that:

Uses high-speed disks
Is highly-clustered
Is made up of only a few narrow columns

See Index Scan Concepts For a detailed discussion of how an index can be beneficial, .

Using the CREATE INDEX Statement

You can create an index that will allow duplicates (a non-unique index) to enable efficient retrieval by columns other than the primary key, and allow duplicate values to exist in the indexed column or columns.

The following SQL statement creates a non-unique index called LNAME from the LASTNAME column on the EMPLOYEE table, sorted in ascending order:

   CREATE INDEX LNAME ON EMPLOYEE (LASTNAME ASC)

The following SQL statement creates a unique index on the phone number column:

   CREATE UNIQUE INDEX PH ON EMPLOYEE (PHONENO DESC)

A unique index ensures that no duplicate values exist in the indexed column or columns. The constraint is enforced at the end of the SQL statement that updates rows or inserts new rows. This type of index cannot be created if the set of one or more columns already has duplicate values.

The keyword ASC puts the index entries in ascending order by column, while DESC puts them in descending order by column. The default is ascending order.

When working with a structured type, it might be necessary to create user-defined index types. This requires a means of defining index maintenance, index search, and index exploitation functions. Refer to the SQL Reference for information on the requirements for creating an index type.

The following SQL statement creates a clustering index called INDEX1 on LASTNAME column of the EMPLOYEE table:

CREATE INDEX INDEX1 ON EMPLOYEE (LASTNAME) CLUSTER

To be effective, use clustering indexes with the PCTFREE parameter associated with the ALTER TABLE statement so that new data can be inserted on the correct pages which maintains the clustering order. Typically, the greater the INSERT activity on the table, the larger the PCTFREE value (on the table) that will be needed in order to maintain clustering. Since this index determines the order by which the data is laid out on physical pages, only one clustering index can be defined for any particular table.

If, on the other hand, the index key values of these new rows are, for example, always new high key values, then the clustering attribute of the table will try to place them at the end of the table. Having free space in other pages will do little to preserve clustering. In this case, placing the table in append mode may be a better choice than a clustering index and altering the table to have a large PCTFREE value. You can place the table in append mode by issuing: ALTER TABLE APPEND ON. See Changing Table Attributes for additional overview information on ALTER TABLE. Refer to the SQL Reference for additional detailed information on ALTER TABLE.

The above discussion also applies to new "overflow" rows that result from UPDATEs which increase the size of a row.

The MINPCTUSED clause of the CREATE INDEX statement specifies the threshold for the minimum amount of used space on an index leaf page. If this clause is used, online index reorganization is enabled for this index. Once enabled, the following considerations are used to determine if an online reorganization takes place: After a key is deleted from a leaf page of this index and a percentage of used space on the page is less than the specified threshold value, the neighboring index leaf pages are checked to determine if the keys on the two leaf pages can be merged into a single index leaf page.

For example, the following SQL statement creates an index with online index reorganization enabled:

   CREATE INDEX LASTN ON EMPLOYEE (LASTNAME) MINPCTUSED=20

When a key is deleted from this index, if the remaining keys on the index page take up twenty percent or less space on the index page, then an attempt is made to delete an index page by merging the keys of this index page with those of a neighboring index page. If the combined keys can all fit on a single page, this merge is performed and one of the index pages is deleted.

The PCTFREE clause of the CREATE INDEX statement specifies the percentage of each index page to leave as free space when the index is built. Leaving more free space on the index pages will result in fewer page splits. This will reduce the need to reorganize the table in order to regain sequential index pages which increases prefetching. And prefetching is one important component that may improve performance. Again, if there are always high key values, then you will want to consider lowering the value of the PCTFREE clause of the CREATE INDEX statement. In this way there will be limited wasted space reserved on each index page.

In multiple partition databases, unique indexes must be defined as supersets of the partitioning key.

If you have a replicated summary table, its base table (or tables) must have a unique index, and the index key columns must be used in the query that defines the replicated summary table. For more information, see Replicated Summary Tables.

For intra-partition parallelism, index create performance is improved by using multiple processors for the scanning and sorting of data that is performed during index creation. The use of multiple processors is enabled by setting intra_parallel to YES(1) or ANY(-1). The number of processors used during index create is determined by the system and is not affected by the configuration parameters dft_degree or max_querydegree, by the application runtime degree, or by the SQL statement compilation degree. If the database configuration parameter index sort is NO, then index create will not use multiple processors.

[ Top of Page | Previous Page | Next Page ]