DB2 Server for VSE & VM: Performance Tuning Handbook

Access Paths and Indexes

To evaluate a query, the database manager determines an access plan that consists of a set of access paths (one for each table listed in the query) and other actions (for example, a sort). Five types of access paths are described here:

Dbspace scans
Nonselective index scans
Selective index scans
Index-only access scans
Unique index with key matching predicate(s).

For each method, a model query is given that refers to a generic table T, with columns C1, C2, C3, and so on.

Notes:

The examples below use SELECT * because they are modeling arbitrary queries. For actual queries, the use of SELECT * is not recommended: all queries should only select the columns that are required in the answer set, in order to reduce the cost of processing the query, and to provide additional access path opportunities.
Considerations regarding the number of data pages read may not apply to tables with long fields.

Dbspace Scans

Assuming that T has no indexes, the model query is:

   SELECT * FROM T WHERE C1 = 42

Because a page can contain rows from any table in a dbspace, the database manager must read every active data page in the dbspace to locate every row of T and to determine whether its value of C1 matches the given value. If there are other tables besides T in the dbspace, they will have to be read as well. If the fraction of the dbspace occupied by T is small, then most of the pages read will contain few or no rows from T.

It is a good idea to make a dbspace scan as inexpensive as possible. This can be accomplished by having one table in a dbspace and reorganizing its rows so that there are none that overflow from their original page onto another page. For information, see Reorganizing a Single Table. Another factor is the amount of free space left on each page. For information, see the DB2 Server for VSE & VM Database Administration manual.

Index Scans

For an index scan, the model query is:

   SELECT * FROM T WHERE C2 = 42

An index scan improves performance by enabling the database manager to avoid the following:

Reading all of the active data pages in the dbspace.
Reading data pages that do not contain desired rows.
Sorting the result.

An index scan performs better than a dbspace scan in many situations. However, it has the following drawbacks:

If VARCHAR or VARGRAPHIC columns are selected, or if not all columns referenced are in the index, then the index scan must read the index pages as well as the data pages. (A dbspace scan reads only data pages.)
If the index is not clustered (even if the index is a clustering index), some data pages may be read more than once. Refer to Clustered Indexes.

There are two types of index scans: nonselective and selective. An index scan on T is selective if C2 is the first column of the index key. All other index scans are nonselective.

Nonselective Index Scans

If T has an index on C1, the database manager can use the index to pick out only those pages that contain rows of table T.

Be aware that if T is the only table in its dbspace, this method may be no better than a dbspace scan. It is only more efficient in those cases where only a portion of the pages in the dbspace contain rows from table T, the result needs to be sorted on the index key, or index sargs can be applied to the index keys.

Selective Index Scans

If T has an index on C2, the database manager will be able to use the index to pick out only those rows from table T where C2 = 42. That is, the only pages that will be accessed are index leaf pages that contain keys where C2 = 42, and non-leaf pages that must be traversed to navigate to these leaf pages.

A selective index scan is generally the most efficient access path. This is true even in the case where T is the only table in the dbspace, if only a portion of the data pages contain rows where C2 = 42. (If all or nearly all pages contained rows where C2 = 42, then a dbspace scan would likely be more efficient).

Index-Only Access Scans

Although in general the database manager has to read data pages for a table to evaluate a query, there are cases where all the columns referenced are present in the index and the predicates do not require the data page. If these conditions are met, then only index pages will be read. This is called index-only access, and is possible for both selective and nonselective index scans. The model query is:

   SELECT C2, C3 FROM T WHERE C2 = 50

(It is assumed here that an index exists on columns C2 and C3.)

Clustering Index

There is no advantage to using a clustered index with index-only access, because clustered indexes are only valuable when the database manager uses an index to access data pages.

Examples of Index only Access

The following are examples of queries that use index-only access. It is assumed that a multicolumn index exists on columns C1, C2, C3, and C4.

SELECT COUNT(*) FROM T WHERE C2 = 5

The database manager scans the entire index looking for C2=5, but no data pages are read.

SELECT C2 FROM T

The database manager scans the entire index, but no data pages are read.

SELECT MIN(C1) FROM T

The database manager does not read the entire index; just a single value.

SELECT MAX(C1) FROM T

For MAX column functions, C1 must be defined as NOT NULL so that a single value is read rather than the entire index.

SELECT C1 FROM T WHERE C1 = 42 AND C4 = 100

The database manager reads only the index entries where C1 = 42 and then scans for C4=100, but no data pages are read.

Index-only access is not possible when a VARCHAR or VARGRAPHIC column appears in the SELECT list or in a residual predicate.

Creating Indexes

In some cases, it may be reasonable to create an index that includes data just to improve the performance of certain common queries. For example, the sample ACTIVITY table identifies each activity by an activity number (ACTNO). It also contains a 6-character activity keyword (ACTKWD). If the table were often used to decode activity numbers by retrieving the corresponding keywords, it might be useful to have an index on both columns. The model query would be:

   SELECT ACTKWD FROM ACTIVITY WHERE ACTNO = 42

Another case where index-only access is beneficial is a table with very long rows, where the portion of the row retrieved is small compared to the size of the row. If a query needs only three or four relatively short columns of that data, an index on those columns might be worthwhile merely to avoid the cost of scanning all data pages and extracting the useful data.

Unique Index with Key Matching Predicate(s)

The model query is:

   SELECT * FROM T WHERE C1 = 42

Here, access is most direct if there is a unique index on column C1. In this case, the database manager reads only as much of the index as needed to locate one entry, and then at most one data page. Furthermore, instead of using a scan, it uses a more efficient operation to return a single row. Refer to Key-matching Predicates.

Indexes for Sorting

The primary use of indexes is to provide selective access to data, but they are also used to sort data in a specified way. Consider this query:

   SELECT * FROM EMPLOYEE
     WHERE WORKDEPT LIKE 'A%'
       ORDER BY EMPNO

The database manager can access the rows needed through an index on WORKDEPT, but then it would have to sort all of those rows by EMPNO. It might estimate that it would be more efficient to access all rows in order by an index on EMPNO, then check the value of WORKDEPT in each one, but eliminate the sort.

The database manager can use indexes for ORDER BY and GROUP BY, but not always for SELECT DISTINCT. It can avoid a sort for SELECT DISTINCT if a unique index is used, or if there is a GROUP BY list that is a subset of the SELECT list.
Note: If an application program contains a SELECT DISTINCT statement that is preprocessed using a unique index, the preprocessor records that the package has a dependency on the unique index. If the unique index ever becomes invalid, the entire package will be invalidated and it will be dynamically repreprocessed the next time it is executed.

Recommendations for Indexes

The nature and purpose of your data will determine what indexes you should create, but the following very general guidelines may be of some help:

If you delete rows from or update the primary key on parent tables, define indexes on foreign keys.
Define primary keys or unique constraints wherever they apply. The database manager automatically defines unique indexes for these.
Use indexes to speed up the most frequent queries to tables with more than 15 data pages; and for tables with more than 10 pages that are primarily accessed for reading only.
Create indexes on fixed-length rather than varying length columns.
Create indexes to include columns frequently queried to allow for index-only access.
Whenever possible, create unique indexes. If you cannot create a unique index on a single column, create a unique multicolumn index. For example if you want to sort the EMPLOYEE table by the JOB column, create an index on JOB and EMPNO (JOB,EMPNO). The index will still sort by JOB, but it can also be a UNIQUE index.
In a multicolumn index, place the "most" unique column first.

Disadvantages of Indexes

The above descriptions of the various types of access paths should suggest to you that indexes can reduce access time significantly. But before you begin creating them, carefully consider their costs:

They require storage space.
It takes time to create and maintain them.
There is overhead associated with keeping them synchronized with the tables they index. It takes more time for the database manager to update a table that uses an index or insert new data into it.
They may increase locking contention.
They increase the time required for recovery.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]