The RUNSTATS utility updates statistics in the system catalog tables to help with the query optimization process. Without these statistics, the database manager could make a decision that would adversely affect the performance of an SQL statement. The RUNSTATS utility allows you to collect statistics on the data contained in the tables, indexes, or both tables and indexes.
Use the RUNSTATS utility to collect statistics based on both the table and the index data to provide accurate information to the access plan selection process in the following situations:
When you are working in a partitioned database, collect the statistics related to a table and its indexes by executing the RUNSTATS operation at a single node. (The node at which the utility executes is determined by whether the node at which you issue the command contains table data or not. See The Database Partition Where RUNSTATS is Executed for details.) Because the statistics stored in the catalogs are supposed to represent table-level information, the node-level statistics collected by the database manager are multiplied where appropriate by the number of nodes across which the table is partitioned. This provides an approximation of the actual statistics that would be collected by executing RUNSTATS at every node and aggregating these statistics.
Note: | The DB2 query optimizer assumes that attribute values (data) are placed equally and evenly across the database partitions of the system. If the placement of data is not equal, you should run this command on a database partition that you think has a representative table distribution. |
When you invoke RUNSTATS on a table, you must be connected to the database in which the table is stored, but the database partition from which you issue the command does not have to contain a partition for this table:
Analyzing the statistics can indicate when reorganization is necessary. Some of these indications are:
If cluster ratio statistics are collected, their value will be in the range from 0 to 100. If cluster factor statistics are collected, their value will be a number between 0 and 1. Only one of these two clustering statistics will be recorded in the SYSCAT.INDEXES catalog. In general, only one of the indexes in a table can have a high degree of clustering. A value of -1 is used to indicate that no statistics are available.
If you wish to compare ratio values, multiply the cluster factor by 100 to obtain a percentage value for the amount of clustering.
Index scans that are not index-only accesses might perform better with higher cluster ratios. A low cluster ratio leads to more I/O for this type of scan, since after the first access of each data page, it is less likely that the page is still in the buffer pool the next time it is accessed. Increasing the buffer size can improve the performance of an unclustered index. (See Understanding List Prefetching for information about how the database manager can improve index scan performance for indexes with low cluster ratios and see Clustered Indexes for information about how the optimizer uses index statistics.)
If the table data was initially clustered with respect to a certain index, and the above clustering information indicates that the data is now poorly clustered for that same index, you may wish to reorganize the table to re-cluster the data with respect to that index.
The overflow number indicates the number of rows that do not fit on their original pages. This can occur when VARCHAR columns are updated with longer values. In such cases, a pointer is kept at the row's original location. This can hurt performance, because the database manager must follow the pointer to find the row's contents, which increases the processing time and may also increase the number of I/Os.
As the number of overflow rows grows higher, the potential benefit of reorganizing your table data also increases. Reorganizing the table data will eliminate the overflowing of rows.
The number of pages with rows can be compared with the total number of pages that a table contains. Empty pages will be read for a table scan. Empty pages can occur when entire ranges of rows are deleted.
As the number of empty pages grows higher, so does the need for a table reorganization. Reorganizing the table can compress the amount of space used by a table, by reclaiming these empty pages. In addition to more efficient use of disk space, reclaiming unused pages can also improve the performance of a table scan, since fewer pages will be read into the buffer pool.
The number of leaf pages predicts how many index page I/Os are needed for a complete scan of an index.
Random update activity can cause page splits to occur that increase the size of the index beyond the minimum amount of space required. When indexes are rebuilt during the reorganization of a table, it is possible to build each index with the minimum amount of space possible. For more information on the minimum space requirements for an index, see Indexing Impact on Query Optimization or refer to "Creating an Index or an Index Specification" section in the Administration Guide: Planning.
Note: | A default of ten percent free space is left on each index page when the indexes are rebuilt. You can increase the free space amount by using the PCTFREE parameter when first creating the index. Then, whenever you reorganize the index, the PCTFREE value is used. Having a free space larger than ten percent may be important if you wish to reduce the number of times you need to reorganize the index. The free space is used to accommodate additional index inserts. |
RUNSTATS can also help you determine how performance is related to changes in your database. The statistics show the data distribution within a table. When used routinely, RUNSTATS provides data about tables and indexes over a period of time, thereby allowing performance trends to be identified for your data model as it evolves over time.
Ideally, you should rebind application programs after running statistics, because the query optimizer may choose a different access plan given the new statistics.
If you do not have enough time available to collect all of the statistics at one time, you may choose to periodically run RUNSTATS to update only a portion of the statistics that could be gathered. If inconsistencies are found as a result of activity on the table between the periods where you run RUNSTATS with a selective partial update, then a warning message (SQL0437W, reason code 6) is issued. For example, you first use RUNSTATS to gather table distribution statistics. Subsequently, you use RUNSTATS to gather index statistics. If inconsistencies are detected as a result of activity on the table, then the table distribution statistics are dropped and the warning message is issued. It is recommended that you run RUNSTATS to gather table distribution statistics when this happens.
You should periodically use RUNSTATS to gather both table and index statistics at once, to ensure that the index statistics are synchronized with the table statistics. Index statistics retain most of the table and column statistics collected from the last run of RUNSTATS. If the table has been modified extensively since the last time its table statistics were gathered, gathering only the index statistics for that table will leave the two sets of statistics out of synchronization.
You may wish to collect statistics based only on index data in the following situations:
The RUNSTATS utility allows you to collect varying levels of statistics. For tables, you can collect basic level statistics or you can also collect distribution statistics for the column values within a table (see Collecting and Using Distribution Statistics). For indexes, you can collect basic level statistics or you can also collect detailed statistics which can help the optimizer better estimate the I/O cost of an index scan. (See Clustered Indexes for information about these "detailed" statistics).
Note: | Statistics are not collected for LONG, large object (LOB), or structured type columns. For row types, the table level statistics NPAGES, FPAGES, and OVERFLOW are not collected for a sub-table. Statistics are not collected for extended indexes, nor for declared temporary tables. |
The following tables show the catalog statistics that are updated by the RUNSTATS utility:
Table 44. Table Statistics (SYSCAT.TABLES and SYSSTAT.TABLES)
Statistic | Description | RUNSTATS Option | |||
---|---|---|---|---|---|
Table | Indexes | ||||
FPAGES | number of pages being used by a table | Yes | Yes | ||
NPAGES | number of pages containing rows | Yes | Yes | ||
OVERFLOW | number of rows that overflow | Yes | No | ||
CARD | number of rows in table (cardinality) | Yes | Yes (Note 2) | ||
|
Table 45. Column Statistics (SYSCAT.COLUMNS and SYSSTAT.COLUMNS)
Statistic | Description | RUNSTATS Option | |||
---|---|---|---|---|---|
Table | Indexes | ||||
COLCARD | column cardinality | Yes (Note 1) | Yes (Note 2) | ||
AVGCOLLEN | average length of column | Yes | Yes (Note 2) | ||
HIGH2KEY | second highest value in column | Yes | Yes (Note 2) | ||
LOW2KEY | second lowest value in column | Yes | Yes (Note 2) | ||
NUMNULLS | the number of NULLs in a column | Yes | Yes (Note 2) | ||
|
Table 46. Index Statistics (SYSCAT.INDEXES and SYSSTAT.INDEXES)
Statistic | Description | RUNSTATS Option | |||
---|---|---|---|---|---|
Table | Indexes | ||||
NLEAF | number of index leaf pages | No | Yes (Note 3) | ||
NLEVELS | number of index levels | No | Yes | ||
CLUSTERRATIO | degree of clustering of table data | No | Yes (Note 2) | ||
CLUSTERFACTOR | finer degree of clustering | No | Detailed (Notes 1,2) | ||
DENSITY | Ratio (percentage) of SEQUENTIAL_PAGES to number of pages in the range of pages occupied by the index (Note 4) | No | Yes | ||
FIRSTKEYCARD | number of distinct values in first column of the index | No | Yes (Note 3) | ||
FIRST2KEYCARD | number of distinct values in first two columns of the index | No | Yes (Note 3) | ||
FIRST3KEYCARD | number of distinct values in first three columns of the index | No | Yes (Note 3) | ||
FIRST4KEYCARD | number of distinct values in first four columns of the index | No | Yes (Note 3) | ||
FULLKEYCARD | number of distinct values in all columns of the index | No | Yes (Note 3) | ||
PAGE_FETCH_PAIRS | page fetch estimates for different buffer sizes | No | Detailed (Notes 1,2) | ||
SEQUENTIAL_PAGES | number of leaf pages located on disk in index key order, with few or no large gaps between them | No | Yes | ||
|
Table 47. Column Distribution Statistics (SYSCAT.COLDIST and SYSSTAT.COLDIST)
Statistic | Description | RUNSTATS Option | |||
---|---|---|---|---|---|
Table | Indexes | ||||
DISTCOUNT | If TYPE is Q, the number of distinct values that are less than or equal to COLVALUE statistics | Distribution (Note 2) | No | ||
TYPE | Indicator of whether row provides frequent-value or quantile statistics | Distribution | No | ||
SEQNO | Frequency ranking of a sequence number to help uniquely identify the row in the table | Distribution | No | ||
COLVALUE | Data value for which frequency or quantile statistic is collected | Distribution | No | ||
VALCOUNT | Frequency with which the data value occurs in column, or for quantiles, the number of values less than or equal to the data value (COLVALUE) | Distribution | No | ||
|
For more information about column distribution statistics, see Collecting and Using Distribution Statistics.
Statistics for user-defined functions are not collected by the RUNSTATS utility. You must manually update the statistics for these functions. See User Update-Capable Catalog Statistics and Updating Statistics for User-Defined Functions.