DB2 Server for VSE & VM: Performance Tuning Handbook


Using Explanation Tables to Evaluate Performance

The explanation tables produced by the EXPLAIN statement allow you to get information about the structure and execution performance of SQL statements. This information can help you analyze how existing database designs perform, or how future designs will perform. Specifically, you can use explanation tables to:

After you complete your design, and construct a prototype, you can use explanation tables to see how well real queries will work against the design. (You can select explain processing explicitly using the EXPLAIN statement, or implicitly using the EXPLAIN(YES) preprocessing parameter or the USING EXPLAIN(YES) option of the CREATE PACKAGE statement.)

Explain Processing

Explain processing accepts an SQL statement as an argument, analyzes it, and inserts information about the structure and execution of that statement into the explanation tables, which you must create. (This includes the cost of internally generated statements.) You can then query the explanation tables.

You can select explain processing explicitly by using the EXPLAIN statement, or implicitly using the EXPLAIN(YES) preprocessor parameter or the EXPLAIN(YES) USING OPTION of the CREATE PACKAGE statement.
Note:Explain processing does not execute the SQL statement. It only explains how the statement will work when you actually execute it.

Using the EXPLAIN Statement

You can use the SQL EXPLAIN statement in an application program, the DBS utility, or ISQL to estimate execution performance. For the syntax of the EXPLAIN statement and the structure of the explanation tables produced, refer to the DB2 Server for VSE & VM SQL Reference manual.

To explicitly process the EXPLAIN statement, you must either:

Each time the EXPLAIN statement is executed, rows are appended to the specified explanation tables. Any existing rows are not affected.

When the EXPLAIN statement is issued for INSERT, UPDATE, and DELETE statements that change tables in a referential structure, information is returned not only on the INSERT, UPDATE, and DELETE statements, but also on internally generated statements. Refer to the DB2 Server for VSE & VM Application Programming manuals for more information on internally generated statements.

Each of the explanation tables has a column called QUERYNO (query number). The QUERYNO column has a data type of INTEGER. With the SET QUERYNO clause, you can place an integer value in the QUERYNO columns of the rows inserted by the EXPLAIN statement. Thus, you can use QUERYNO to identify new rows, and to mark them as corresponding to a particular statement.

For integer in SET QUERYNO, you must specify an integer constant that is not preceded by a sign. You cannot use a host variable in the SET QUERYNO clause, even in application programs. However, you can use the "&n" place-holder variables in ISQL.

The SET QUERYNO clause is optional. If you omit it, a NULL value is placed in the fields of the rows inserted by the EXPLAIN statement. If you set the QUERYNO to some initial value, this value identifies the query for which the EXPLAIN is issued. Because QUERYNO is an INTEGER field, an error is returned if its value exceeds 2 147 483 647.

You can enter the EXPLAIN statement from ISQL. When you enter EXPLAIN from ISQL, you must use a character constant if you execute the statement immediately. Alternatively, you can store the EXPLAIN statement by placing it in a routine or by using an ISQL STORE statement. This lets you use a place-holder (for example, &1) for explainable_sql_statement (refer to the DB2 Server for VSE & VM SQL Reference manual for the syntax of the EXPLAIN statement, including explainable_sql_statement). Thus, you can execute EXPLAIN for different SQL statements without having to key in the entire EXPLAIN statement each time. When using this technique, however, you should keep ISQL limits in mind. For example, when the place-holder is in a routine table, the length of input to a parameter is limited by the length of the COMMAND column of the routine table. At most, input to a place-holder can be 254 characters. This number is further reduced if you do not put the "&n" place-holder on a line by itself.

When you enter EXPLAIN from the DBS utility, you must use a character constant for explainable_sql_statement. The utility does not allow the use of host variables or place-holders in any SQL statement.

Using the EXPLAIN Option

You can select explain processing implicitly using the EXPLAIN(YES) preprocessing parameter or the EXPLAIN(YES) using option of the CREATE PACKAGE statement. If you select explain processing implicitly, explanatory information is provided for all SQL statements in a package that can be explained and for all internally generated statements. The name of the package and the name of the owner of the package are stored in the explanation tables.

Because you cannot assign a QUERYNO when you select explain processing implicitly, the section number assigned to the statement being preprocessed is used as the query number. This number corresponds to the position of the query in the application. Using the preprocessor listing file, you can determine the section number assigned to a statement and use it to determine the corresponding rows in the explanation tables. The following variable names are used as section numbers for the languages specified.

Table 9. Variable Names for Section Number (Query Number)
Language Structure Name Variable Name
ASM RDIIN RDISECT#
COBOL RDIIN SQL-SECTION-NUM
C RDIIN SECTION_NUM
PLI RDIIN SECTION_NUM
FORTRAN SQLCTL SQLSTMT

For further information, see the DB2 Server for VSE & VM SQL Reference and DB2 Server for VSE & VM Application Programming manuals.

Comparing Implicit and Explicit Explain Processing

Implicit and explicit explain processing insert the same kind of information into the explanation tables during explain processing. The package name and package owner columns of the explanation tables, however, contain information only if implicit explain processing is used.

There is also a difference between implicit and explicit explain processing for the query number of an application. For explicit explain processing, if you do not supply the query number, it is set to NULL. For implicit explain processing, you cannot provide query numbers for SQL statements in the middle of an application, so the section number assigned to the statement when it is processed is used as the query number. You can then use the preprocessor listing file to determine the section number assigned to each statement and the corresponding rows in the explanation tables.

During explicit explain processing, rows are added to the explanation tables when a program is preprocessed, or dynamically repreprocessed. During implicit explain processing, rows are added when a program is preprocessed, but not when it is dynamically repreprocessed. In all situations, explicit explain processing overrides implicit explain processing.

When you processes an application program using the implicit EXPLAIN(YES) option, the preprocessor checks for the existence of the EXPLAIN tables once. If it does not find them, processing is terminated and SQLCODE -649 (SQLSTATE = 42704) is issued.

Each time the preprocessor encounters an explicit EXPLAIN statement, it checks for the existence of the explain tables. If its does not find them the explicit EXPLAIN is not processed and, SQLCODE -204 (SQLSTATE = 42704) or SQLWARNING +204 (SQLSTATE = 01532) is issued. This check is repeated for every explicit EXPLAIN statement that the preprocessor encounters.

Using Explanation Tables

There are four explanation tables: REFERENCE_TABLE, STRUCTURE_TABLE, COST_TABLE, and PLAN_TABLE. The definitions of these tables and the EXPLAIN statement syntax are in the DB2 Server for VSE & VM SQL Reference manual. A DBS utility job file, ARISEXP, to generate explanation tables, indexes, and views is shipped with the DB2 Server for VSE & VM product. Instructions for generating the tables using the ARISEXP file are provided at the top of the file. For further information on the contents of the explanation tables, refer to the DB2 Server for VSE & VM SQL Reference manual.

When you execute an EXPLAIN statement, information is placed in your tables, and is independent of any other user's explanation tables. You can review and summarize the information placed in your explanation tables just as you can other tables. However, because explanation tables only insert rows, you also have the responsibility to delete unnecessary information yourself.

In the following descriptions of each table, the term query block is used. A query block is a part of a query. Query blocks are used to distinguish the parts of a subquery. For example, when a query does not involve a subquery, there is only one query block: query block 1. When there is a subquery, there are two query blocks, the outer-level query and the subquery. They are referred to as query block 1 and query block 2, respectively. Because subqueries may be nested within each other, there may be many query blocks in a statement; each query block corresponds to separate (but interacting) parts of the statement.

The SELECT statement in Figure 17 is used in the following descriptions of the explanation tables. This SELECT statement has only one query block.

Figure 17. SELECT Statement for Explanation Table Descriptions

   SELECT X.DEPTNAME, Y.FIRSTNME, Y.MIDINIT, Y.LASTNAME, Y.PHONENO
   FROM DEPARTMENT X, EMPLOYEE Y
   WHERE X.MGRNO = Y.EMPNO
     AND X.DEPTNO = Y.WORKDEPT

Assume that user Smith owns tables DEPARTMENT and EMPLOYEE where:

Using the Cost Table

This table is updated by EXPLAIN COST or EXPLAIN ALL. The information in this table provides the cost estimate of the statement for which the EXPLAIN is issued and for any internally generated statement used to enforce referential integrity. In addition, you can compute the contribution of each subquery (if any) to the total cost estimate of the statement. (To compute the subquery cost estimates, you will need to use information provided by the EXPLAIN STRUCTURE statement.)

For each query block in the statement, EXPLAIN inserts one row into COST_TABLE. The information depends on the existing indexes and catalog statistics. If indexes are added or dropped after you issue EXPLAIN for the statement, then the COST_TABLE entry for the command is not valid.

If you need a description of the columns in the COST_TABLE, refer to the DB2 Server for VSE & VM SQL Reference manual.

The value in COST is referred to as the cost estimate (occasionally referred to as the optimizer cost estimate or the resource cost estimate). All of these terms refer to the same thing: the internal value that the optimizer uses to represent the resource cost of executing an SQL statement for which the EXPLAIN is issued and for any internally generated statement used to enforce referential integrity. The value is a relative value that incorporates I/O requirements with a weighted factor of processor requirements for a query.

Aside from the COST column, there are two other ways to get this value for a given SQL statement. One way is to examine the SQLCA after preparing a dynamically defined SQL statement. The cost estimate is kept in the SQLERRD(4) field. A second way to see the cost estimate is using ISQL. ISQL displays the query cost estimate integer. The integer results from dividing the real internal value by 1000 and adding 1 to it. This produces a number that is easier to grasp. This is a valid technique because the numbers are relative to each other; they do not represent real physical consumption directly.

It follows, then, that it is futile to try to develop an algorithm that directly maps the cost estimate to a real physical unit such as time. Too many other factors are involved (for example, overall system workload). It is best to use the cost estimate as a general indicator.

For the SELECT statement in Figure 17, there would be only one row entered into COST_TABLE, because there is only one query block:

Figure 18. Results of COST_TABLE Query

   QUERYNO  RINO  QBLOCKNO  PKGNAME  PKGOWNER  COST
   -------  ----  --------  -------  --------  ------------------
         1     0         1                     1.4388885498046E+01
 
   TIMESTAMP
   --------------------------
   1999-08-26-09.49.25.601721
 

Using the Plan Table

This table is updated by EXPLAIN PLAN or EXPLAIN ALL. The information in this table describes the order in which tables are accessed by the statement for which the EXPLAIN is issued and by any internally generated statement used to enforce referential integrity. In addition, the PLAN_TABLE table describes the indexes used to access the tables, and specifies whether indexes alone were used, the methods that the database manager used to do joins, the sorts done as part of runtime processing, and the reasons for the sorts.

As with the COST_TABLE, the PLAN_TABLE results depend on the existing indexes and catalog statistics at the time the EXPLAIN statement is executed. If indexes are added or deleted, then the PLAN_TABLE entry for the statement is not valid.

For each step in the plan determined by the database manager for processing the query, EXPLAIN inserts one row into the PLAN_TABLE. There is one step for each table reference in a query block. There are additional steps if the database manager must perform additional sorts at the end of processing for the query block, or if any internally generated statements are to be processed.

The steps in the plan are ordered by the value of the PLANNO column of PLAN_TABLE, and for each step, the TNAME column identifies the table accessed. The phrase "previous steps of the plan" refers to PLAN_TABLE rows with smaller values of PLANNO. The action described in a step is either a join of a table to those previously joined, or it is a sort. (Joins themselves may involve performing sorts.) The term "composite" refers to the result of all previous steps; the term "new" refers to the new table that is being accessed and joined as part of a particular plan step.

If you need a description of the columns in the PLAN_TABLE, refer to the DB2 Server for VSE & VM SQL Reference manual.

The PLAN_TABLE for the query in Figure 17 is shown below. Because there are many columns in PLAN_TABLE, the display of the table is split to fit on the page:

Figure 19. Results of PLAN_TABLE Query

 
  QUERYNO  RINO  QBLOCKNO  PKGNAME  PKGOWNER  PLANNO  METHOD  CREATOR
  -------  ----  --------  -------  --------  ------  ------  -------
        1     0         1                          1       0  SMITH
        1     0         1                          2       1  SMITH
 
  TNAME       TABNO  ACCESSTYPE  MATCHCOLS  ACCESSCREATOR
  ----------  -----  ----------  ---------  -------------
  DEPARTMENT      1  W                   0  SMITH
  EMPLOYEE        2  I                   0  SMITH
 
  ACCESSNAME        INDEXONLY  SORTNEW  SORTCOMP  SORTN_UNIQ  SORTN_JOIN
  ----------        ---------  -------  --------  ----------  ----------
  MGRNOI            N          N        N         N           N
  PKEYB1H9ZR8CD51W  N          N        N         N           N
 
  SORTN_ORDERBY  SORTN_GROUPBY  SORTC_UNIQ  SORTC_JOIN  SORTC_ORDERBY
  -------------  -------------  ----------  ----------  -------------
  N              N              N           N           N
  N              N              N           N           N
 
  SORTC_GROUPBY  TIMESTAMP                   REMARKS
  -------------  --------------------------  -------
  N              1999-08-26-09.49.25.601721
  N              1999-08-26-09.49.25.601721

A Type 1 (or nested loop) join is performed on tables EMPLOYEE and DEPARTMENT. The database manager accesses DEPARTMENT as the outer table of the join (the first table accessed), and EMPLOYEE as the inner table of the join.

The row with PLANNO=1 indicates that the database manager accesses DEPARTMENT using the index MGRNOI (which, as it happens, was created on the MGRNO column).

The entry with PLANNO=2 indicates that the database manager has performed an action on the EMPLOYEE table, based upon the conditions included in the query. An index has been generated internally on the primary key EMPNO of the EMPLOYEE table. This index performs the matching of EMPNO (in the EMPLOYEE table) to MGRNO (in the DEPARTMENT table). This index can be used because the value in MGRNO of DEPARTMENT, which must be matched by the EMPNO value of EMPLOYEE. Retrieval of rows from an inner table of a join will often, though not always, use an index on a join column of the inner table.

No sorts are used in the plan for this query. However, if the query had demanded SELECT DISTINCT, instead of SELECT, the plan would have an additional row, with PLANNO=3, which would have TABNO=0, METHOD=3, SORTC_UNIQ='Y' and SORTCOMP='U'.

Using the Reference Table

This table is updated by EXPLAIN REFERENCE or EXPLAIN ALL. The database manager inserts one row in REFERENCE_TABLE for each column referenced in the statement (in certain ways, as explained below) and for any internally generated statement used to enforce referential integrity. Even if the column is referenced more than once for a table, there is still only one row inserted for the column and that row is for the most selective predicate. However, multiple appearances of a table in a query (as when a table is joined to itself) can lead to multiple descriptions of their columns.

One row is entered for each table reference, one for the statement as a whole, and one that indicates the way in which the column is used in the query. For a description of the columns in the REFERENCE_TABLE, refer to the DB2 Server for VSE & VM SQL Reference manual.

For the example statement presented in Figure 17, the new rows entered into REFERENCE_TABLE by EXPLAIN REFERENCE might be:

Figure 20. Results of the REFERENCE_TABLE Query

 
  QUERYNO  RINO  QBLOCKNO  PKGNAME  PKGOWNER  REFTYPE  CREATOR  TNAME
  -------  ----  --------  -------  --------  -------  -------  ----------
        1     0         1                     SELECT
        1     0         1                     TABLE    SMITH    DEPARTMENT
        1     0         1                     TABLE    SMITH    EMPLOYEE
        1     0         1                     COLUMN   SMITH    DEPARTMENT
        1     0         1                     COLUMN   SMITH    DEPARTMENT
        1     0         1                     COLUMN   SMITH    DEPARTMENT
        1     0         1                     COLUMN   SMITH    EMPLOYEE
        1     0         1                     COLUMN   SMITH    EMPLOYEE
        1     0         1                     COLUMN   SMITH    EMPLOYEE
        1     0         1                     COLUMN   SMITH    EMPLOYEE
        1     0         1                     COLUMN   SMITH    EMPLOYEE
        1     0         1                     COLUMN   SMITH    EMPLOYEE
 
  TABNO  CNAME  COLNO  FILTER               DBSSPRED  JOINPRED  ORDERCOL
  -----  -----  -----  -------------------  --------  --------  --------
      0             0                0.0E0                             0
      1             0                0.0E0                             0
      2             0                0.0E0                             0
      1  DEPTNAME   2              1.0E+00  N         N                0
      1  DEPTNO     1   1.111111448837E-01  Y         Y                0
      1  MGRNO      3            3.125E-02  Y         Y                0
      2  EMPNO      1            3.125E-02  Y         Y                0
      2  FIRSTNME   2              1.0E+00  N         N                0
      2  LASTNAME   4              1.0E+00  N         N                0
      2  MIDINIT    3              1.0E+00  N         N                0
      2  PHONENO    6              1.0E+00  N         N                0
      2  WORKDEPT   5  1.1111110448837E-01  Y         Y                0
 
  GROUPCOL  UPDATECOL  TIMESTAMP
  --------  ---------  -------------------------
         0             1999-08-26-09.49.25.601721
         0             1999-08-26-09.49.25.601721
         0             1999-08-26-09.49.25.601721
         0             1999-08-26-09.49.25.601721
         0             1999-08-26-09.49.25.601721
         0             1999-08-26-09.49.25.601721
         0             1999-08-26-09.49.25.601721
         0             1999-08-26-09.49.25.601721
         0             1999-08-26-09.49.25.601721
         0             1999-08-26-09.49.25.601721
         0             1999-08-26-09.49.25.601721
         0             1999-08-26-09.49.25.601721
 

These rows indicate that the statement is a SELECT statement with no subqueries, joining two tables, SMITH.DEPARTMENT and SMITH.EMPLOYEE. The columns MGRNO, DEPTNO from DEPARTMENT, and the columns EMPNO, WORKDEPT from EMPLOYEE appear together in the 'WHERE' clause (identified by a Y in the DBSSPRED column), permitting indexes to be used. These columns are the JOIN columns (identified by a Y in the JOINPRED column). FILTER may be misleading, because the filtering depends on the order in which tables are processed.



Referential Integrity (RINO Value)

RINO is set to zero for the original statement and is automatically
incremented by one for each internally-generated statement that is processed
for referential integrity or cascade delete. For example if you perform
an EXPLAIN against a statement that deletes a department from the DEPARTMENT
table, the following REFERENCE table is generated.

+--------------------------------------------------------------------------------+
|        QNO   RINO QBLOCK REFTYPE TNAME              CNAME             ...      |
|----------- ------ ------ ------- ------------------ ------------------...      |
|          1      0      0 DELETE                                       ...      |
|          1      0      1 TABLE   DEPARTMENT                           ...      |
|          1      1      0 UPDATE                                       ...      |
|          1      1      1 TABLE   EMPLOYEE                             ...      |
|          1      1      1 COLUMN  EMPLOYEE           WORKDEPT          ...      |
|          1      2      0 SELECT                                       ...      |
|          1      2      1 TABLE   PROJECT                              ...      |
|          1      2      1 COLUMN  PROJECT            DEPTNO            ...      |
+--------------------------------------------------------------------------------+

Notice that the DELETE statement that was written (RINO=0) produces two
other statements: First an UPDATE that changes the WORKDEPT column for
any employee in the deleted department to NULL (RINO=1), and second a SELECT
that checks that any departments to be deleted do not have any projects
assigned to them (RINO=2).

The REFERENCE_TABLE and the PLAN-TABLE can be used together to indicate
whether materialization was used to generate a view. View
materialization lifts a number of restrictions on the use of views, including
the use of column functions operating on the column of a view when the
definition of the view already contains a column function. For
example:

   CREATE VIEW V1(DPT,MAXSAL) AS
       SELECT   WORKDEPT, MAX(SALARY)
       FROM     EMPLOYEE
       GROUP BY WORKDEPT
   EXPLAIN ALL FOR SELECT DPT FROM V1

Because view materialization is used for view V1, the TNAME column in the
REFERENCE_TABLE and PLAN_TABLE will contain the name of the view.
Keeping in mind that view materialization is generally more expensive than
merging the SELECT statement of the view with that of the query, the
information on the EXPLAIN tables can be helpful in performance
tuning.


Using the Structure Table

This table is updated by EXPLAIN STRUCTURE or EXPLAIN ALL. The database manager inserts one row in STRUCTURE_TABLE for each query block in the statement.

If you need a description of the columns in the STRUCTURE_TABLE, refer to the DB2 Server for VSE & VM SQL Reference manual.

If the following SELECT statement is issued, only one row is entered in STRUCTURE_TABLE, as shown in Figure 21, because there is only one query block.

   EXPLAIN ALL FOR SELECT * FROM EMPLOYEE

Figure 21. Results of the STRUCTURE_TABLE Query

   QUERYNO  RINO  QBLOCKNO  PKGNAME  PKGOWNER  ROWCOUNT      TIMES
   -------  ----  --------  -------  --------  --------  ---------
               0         1                           32      0.0E0
 
   PARENT  ATOPEN  TIMESTAMP
   ------  ------  --------------------------
        0  N       1999-08-26-09.49.26.001720

A more complicated example is provided in the following sections, where we show how to separate the costs for individual query blocks using STRUCTURE_TABLE and COST_TABLE together.

Using Subquery Blocks

A query may have subqueries, which in turn may have subqueries. The database manager separates this tree of subqueries into pieces, called query blocks. Each query block has its own tables, columns, and rowcount. EXPLAIN STRUCTURE, COST lets you look at combined information, or to separately examine the information for each query block.

The PARENT field gives the logical parent for each query block, which is not always obvious from the query itself. Sometimes, a query block has no correlation to the query where it immediately appears, so it is executed only once, when some ancestor query block is first entered, rather than many times.

The following example has a large number of query blocks, but references only one table (many times). It illustrates the meanings of PARENT and ATOPEN, as well as the method for decomposing COST values into separate costs for query blocks.



                                           QBLOCKNO
                                          ---------
   SELECT * FROM DEPT X                   *** 1 ***
   WHERE DNAME > ALL
     (SELECT DNAME FROM DEPT              *** 2 ***
      WHERE X.DNO = DNO
      AND LOC = 32)
   AND DNO =
     (SELECT DNO FROM DEPT Y              *** 3 ***
      WHERE MGR =
        (SELECT MGR FROM DEPT Z           *** 4 ***
         WHERE DNAME IN
           (SELECT DNAME FROM DEPT        *** 5 ***
            WHERE NEMP = X.NEMP)
         AND DNO =
           (SELECT DNO FROM DEPT W        *** 6 ***
            WHERE NEMP > Z.NEMP
            AND LOC IN
              (SELECT LOC FROM DEPT       *** 7 ***
               WHERE DNAME = Y.DNAME))
          AND LOC = 32 )
      AND Y.NEMP <
        (SELECT AVG(NEMP) FROM DEPT       *** 8 ***
         WHERE Y.MGR = MGR ));

Here are results from STRUCTURE_TABLE for this query. (ROWCOUNT is not shown.)

Figure 22. Example STRUCTURE_TABLE

   QUERYNO  RINO  QBLOCKNO  PKGNAME  PKGOWNER   TIMES  PARENT
   -------  ----  --------  -------  --------  ------  ------
         ?     0         1                      9.000       0
         ?     0         2                      0.500       1
         ?     0         3                      9.000       1
         ?     0         4                      0.500       3
         ?     0         5                      2.000       3
         ?     0         6                      1.500       4
         ?     0         7                      1.000       4
         ?     0         8                      1.000       3
 
   ATOPEN  TIMESTAMP
   ------  --------------------------
        N  1999-08-26-09.49.27.011719
        N  1999-08-26-09.49.27.011719
        N  1999-08-26-09.49.27.011719
        N  1999-08-26-09.49.27.011719
        Y  1999-08-26-09.49.27.011719
        N  1999-08-26-09.49.27.011719
        Y  1999-08-26-09.49.27.011719
        N  1999-08-26-09.49.27.011719

Note:A ? indicates a NULL value.

Although query block 5 is physically nested within query block 4, it references neither Y nor Z. Hence, its value can be computed once each time query block 3 is first entered, with a particular X row.

Query block 5 is executed once, at open time of query block 3.

Query block 7 is physically within query block 6, but can be evaluated once when the database manager first enters query block 4. Query block 7 does not need to be re-executed again until new values for rows outside query block 4 are required. This is different from query block 2, for example, which must be executed once for each row of query block 1, rather than just once when execution of query block 1 begins.

TIMES may be a fraction (and may be less than 1) because it represents the estimated number of times, per execution of a query block, that its dependent query blocks will be executed.

The most important use of this query block breakdown involves computation of costs for individual query blocks, instead of costs for the query as a whole. This is described next.

Computing Block Costs

Here are the COST_TABLE entries for the query in the preceding section:

Figure 23. Results of COST_TABLE Query

   QUERYNO  RINO  QBLOCKNO  PKGNAME  PKGOWNER       COST
   -------  ----  --------  -------  --------  ---------
         ?     0         1                      3021.000
         ?     0         2                        10.000
         ?     0         3                       324.000
         ?     0         4                        24.000
         ?     0         5                         4.000
         ?     0         6                         6.000
         ?     0         7                        10.000
         ?     0         8                        10.000
 
   TIMESTAMP
   --------------------------
   1999-08-26-09.49.27.011719
   1999-08-26-09.49.27.011719
   1999-08-26-09.49.27.011719
   1999-08-26-09.49.27.011719
   1999-08-26-09.49.27.011719
   1999-08-26-09.49.27.011719
   1999-08-26-09.49.27.011719
   1999-08-26-09.49.27.011719

The cost displayed is the total cost for each query block, including costs associated with all query blocks that are below it in the logical tree of query blocks. Thus, the cost of executing the statement is approximately 3021. (For simplicity in the calculations that follow, the values are shown as integers, but they need not be.)

The cost for executing the entire statement helps you understand the effect of the statement on system load, but it hides the blocks of the query that are contributing the most to the cost of the query.

A more useful set of figures might be those listed in the following table:
Note:The following table is only an example. It is not stored in the database, and INDIVCOST and MULTCOST are not columns in COST_TABLE.

Figure 24. Example COST_TABLE

   QBLOCKNO  INDIVCOST     MULTCOST
   --------  ---------   ----------
          1     15.000       15.000
          2     10.000       90.000
          3     14.000      126.000
          4     11.000      891.000
          5      4.000       36.000
          6      6.000      243.000
          7     10.000      810.000
          8     10.000      810.000

INDIVCOST represents the cost of one execution of the individual query block itself, not including the costs of any of its subqueries. MULTCOST not only counts the cost of the individual query block, but multiplies INDIVCOST by the number of times the query block is expected to be executed in the query. This is a better measure of the cost importance of the query block than either COST or INDIVCOST. Notice that the MULTCOST column adds up to COST(1), the total cost of the entire query.

Thus, the following formula can be used to derive INDIVCOST:




 
COST(I) = INDIVCOST(I) + the sum, over all blocks J that have
I as logical parent, of either:
 
TIMES(I) * COST(J)
 
if J is not done AT OPEN of I, or
 
COST(J)
 
if J is done AT OPEN of I.

For example:

   COST(3) = 324 = INDIVCOST(3) + 9*24 + 4 + 9*10,

so INDIVCOST(3) is 14.

Here is another example:

   COST(4) = 24 = INDIVCOST(4) + 0.5*6 + 10,

so INDIVCOST(4) is 11.

This formula can be used to derive MULTCOST from INDIVCOST:




 
MULTCOST(I) =
INDIVCOST(I) * the product of TIMES(I) for
all logical ancestors of I,
if I is not done AT OPEN of its parent,
or
INDIVCOST(I) * the product of TIMES(I) for
all logical ancestors of I EXCEPT its parent,
if I is done AT OPEN of its parent.

For example:

   MULTCOST(7) = 10 * TIMES(3) * TIMES(1) = 810.

We do not multiply TIMES(4) into that product, because 4 is 7's immediate parent, and 7 is done AT OPEN of 4.

This demonstrates that the most important component of the estimated cost comes from block 4, and you might choose indexes that make processing this query block cheaper. By inspecting the query, or from the rows in REFERENCE_TABLE, you might decide that indexes on one of DNAME, DNO or LOC might reduce the cost of processing.

Estimating Sizes of Responses

Because ROWCOUNT (estimated number of rows in result, for queries, or of affected rows, for updates and deletes) is stored in STRUCTURE_TABLE, it is easy to gauge the estimated size of responses. If your structure includes a DELETE CASCADE rule, EXPLAIN will include the cost of the cascading effects of a DELETE. This can help you understand whether requests are reasonable, and whether the statistics in the database catalog tables that estimate ROWCOUNT seem up-to-date.

ROWCOUNT can also help determine space requirements when responses are being stored in program data structures. However, ROWCOUNT, like all estimates made by the system, is neither precisely accurate, nor even an upper bound on the actual number of rows in the response.

Using EXPLAIN for Database Design

A systematic analysis of many statements in the workload of the system can help the administrator plan the access paths for the database. The analysis should consider costs when different combinations of indexes exist. It should also consider the costs of performing updates, which are not reflected in the COST column of COST_TABLE, and limits on space for indexes.

You may load the database with tables ordered on certain columns. Indexes on such columns (called CLUSTERING indexes) enable the database manager to maintain this ordering. Because the ordering of rows within tables strongly affects the costs of execution, it may be worthwhile to reload the database to improve performance. For each statement, join, ORDER BY, and GROUP BY columns may be good candidates for ordering. Also, tables that are often joined might be interleaved on join columns when the database is loaded.

If the cost of executing a statement (as determined by EXPLAIN COST, or by running the statement) is higher than expected, a user or administrator may want to look at the procedure that the database manager chose to execute that statement. Building additional access paths or altering the layout of tables may be necessary to achieve good performance for the statement. For example, if a relation scan, that is ACCESS TYPE='R', is performed on a large table, it may be better to build an index on some column of that table; EXPLAIN REFERENCE provides hints about which indexes might help. Adding new indexes makes updates more expensive, so this decision must be considered carefully.

The PLAN_TABLE can also help the administrator determine which indexes are not being used, so that he or she may decide which indexes might be dropped. This assumes that the administrator knows not only the significant statements in programs, but also the significant statements issued by users directly at their terminals.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]