To evaluate a query, the database manager determines an access plan that consists of a set of access paths (one for each table listed in the query) and other actions (for example, a sort). Five types of access paths are described here:
For each method, a model query is given that refers to a generic table T, with columns C1, C2, C3, and so on.
Notes:
Assuming that T has no indexes, the model query is:
SELECT * FROM T WHERE C1 = 42
Because a page can contain rows from any table in a dbspace, the database manager must read every active data page in the dbspace to locate every row of T and to determine whether its value of C1 matches the given value. If there are other tables besides T in the dbspace, they will have to be read as well. If the fraction of the dbspace occupied by T is small, then most of the pages read will contain few or no rows from T.
It is a good idea to make a dbspace scan as inexpensive as possible. This can be accomplished by having one table in a dbspace and reorganizing its rows so that there are none that overflow from their original page onto another page. For information, see Reorganizing a Single Table. Another factor is the amount of free space left on each page. For information, see the DB2 Server for VSE & VM Database Administration manual.
For an index scan, the model query is:
SELECT * FROM T WHERE C2 = 42
An index scan improves performance by enabling the database manager to avoid the following:
An index scan performs better than a dbspace scan in many situations. However, it has the following drawbacks:
There are two types of index scans: nonselective and selective. An index scan on T is selective if C2 is the first column of the index key. All other index scans are nonselective.
If T has an index on C1, the database manager can use the index to pick out only those pages that contain rows of table T.
Be aware that if T is the only table in its dbspace, this method may be no better than a dbspace scan. It is only more efficient in those cases where only a portion of the pages in the dbspace contain rows from table T, the result needs to be sorted on the index key, or index sargs can be applied to the index keys.
If T has an index on C2, the database manager will be able to use the index to pick out only those rows from table T where C2 = 42. That is, the only pages that will be accessed are index leaf pages that contain keys where C2 = 42, and non-leaf pages that must be traversed to navigate to these leaf pages.
A selective index scan is generally the most efficient access path. This is true even in the case where T is the only table in the dbspace, if only a portion of the data pages contain rows where C2 = 42. (If all or nearly all pages contained rows where C2 = 42, then a dbspace scan would likely be more efficient).
Although in general the database manager has to read data pages for a table to evaluate a query, there are cases where all the columns referenced are present in the index and the predicates do not require the data page. If these conditions are met, then only index pages will be read. This is called index-only access, and is possible for both selective and nonselective index scans. The model query is:
SELECT C2, C3 FROM T WHERE C2 = 50
(It is assumed here that an index exists on columns C2 and C3.)
There is no advantage to using a clustered index with index-only access, because clustered indexes are only valuable when the database manager uses an index to access data pages.
The following are examples of queries that use index-only access. It is assumed that a multicolumn index exists on columns C1, C2, C3, and C4.
SELECT COUNT(*) FROM T WHERE C2 = 5
The database manager scans the entire index looking for C2=5, but no data pages are read.
SELECT C2 FROM T
The database manager scans the entire index, but no data pages are read.
The database manager does not read the entire index; just a single value.
SELECT MAX(C1) FROM T
For MAX column functions, C1 must be defined as NOT NULL so that a single value is read rather than the entire index.
SELECT C1 FROM T WHERE C1 = 42 AND C4 = 100
The database manager reads only the index entries where C1 = 42 and then scans for C4=100, but no data pages are read.
Index-only access is not possible when a VARCHAR or VARGRAPHIC column appears in the SELECT list or in a residual predicate.
In some cases, it may be reasonable to create an index that includes data just to improve the performance of certain common queries. For example, the sample ACTIVITY table identifies each activity by an activity number (ACTNO). It also contains a 6-character activity keyword (ACTKWD). If the table were often used to decode activity numbers by retrieving the corresponding keywords, it might be useful to have an index on both columns. The model query would be:
SELECT ACTKWD FROM ACTIVITY WHERE ACTNO = 42
Another case where index-only access is beneficial is a table with very long rows, where the portion of the row retrieved is small compared to the size of the row. If a query needs only three or four relatively short columns of that data, an index on those columns might be worthwhile merely to avoid the cost of scanning all data pages and extracting the useful data.
The model query is:
SELECT * FROM T WHERE C1 = 42
Here, access is most direct if there is a unique index on column C1. In this case, the database manager reads only as much of the index as needed to locate one entry, and then at most one data page. Furthermore, instead of using a scan, it uses a more efficient operation to return a single row. Refer to Key-matching Predicates.
The primary use of indexes is to provide selective access to data, but they are also used to sort data in a specified way. Consider this query:
SELECT * FROM EMPLOYEE WHERE WORKDEPT LIKE 'A%' ORDER BY EMPNO
The database manager can access the rows needed through an index on WORKDEPT, but then it would have to sort all of those rows by EMPNO. It might estimate that it would be more efficient to access all rows in order by an index on EMPNO, then check the value of WORKDEPT in each one, but eliminate the sort.
The database manager can use indexes for ORDER BY and GROUP BY, but not always for SELECT DISTINCT. It can avoid a sort for SELECT DISTINCT if a unique index is used, or if there is a GROUP BY list that is a subset of the SELECT list.
Note: | If an application program contains a SELECT DISTINCT statement that is preprocessed using a unique index, the preprocessor records that the package has a dependency on the unique index. If the unique index ever becomes invalid, the entire package will be invalidated and it will be dynamically repreprocessed the next time it is executed. |
The nature and purpose of your data will determine what indexes you should create, but the following very general guidelines may be of some help:
The above descriptions of the various types of access paths should suggest to you that indexes can reduce access time significantly. But before you begin creating them, carefully consider their costs: