DB2 Server for VSE & VM: Performance Tuning Handbook


Predicate Processing

Search conditions contain predicates joined with AND, OR, and NOT. A predicate is a search condition in a WHERE or HAVING clause of an SQL statement. Examples include C1 = 10, C2 BETWEEN 10 AND 20, EXISTS(subquery), and C4 NOT LIKE 'A%'. Only those rows that satisfy a predicate are returned.

Predicates are resolved in one of two categories: residual or sargable. (Sargable is a term derived from the words "search argument".) Sargable predicates are applied at the Database Storage Subsystem (DBSS) level; residual predicates are applied at the Relational Data System (RDS) level.

Figure 14 shows the hierarchy of predicates.

Figure 14. Predicate Hierarchy


ARIT1PRH

Sargable predicates are further divided into two categories: those that use the index, and those that do not. The former are called either key-matching predicates or index page sargs. The latter are called data page sargs.

A key-matching predicate, which is applied directly to the index key, is created when the columns referenced in the predicate form an initial substring of an index on the table.

An index page sarg is resolved using the index page, but is not used to search the index key. It is created when the columns referenced in the predicate are not an initial substring of an index, but are contained in the index.

A data page sarg does not use the index, and always requires the data pages be read. It is created when the columns referenced in the predicate are not contained in the index.

Column Attributes

The next sections deal with predicates in the WHERE clause. For these predicates, it is important that the data types and CCSIDs of any columns and literals match whenever possible. That is, numeric values should use the same representation, including the same precision and scale for DECIMAL values. Character and graphic values should have the same length. Columns and literals should use the same CCSID, refer to Impact of CCSIDs on Sargability. Adhering to this rule will always give the database manager the greatest flexibility in choosing an efficient access path. All the examples assume that this rule has been followed. For more information, refer to note number one on page 1.

Key-matching Predicates

Before a predicate can be considered key-matching it must be in the correct form and a suitable index must be available.

Form of Key-matching Predicates

Some types of predicates can match index entries; other types cannot. For example, if the EMPLOYEE table has an index on the column SEX, it matches the predicate in this query:

   SELECT * FROM EMPLOYEE WHERE SEX = 'M'

On the other hand, the same index does not match the predicate in this query:

   SELECT * FROM EMPLOYEE WHERE SEX < > 'F'

We call a predicate key-matching if it can match the entries in a suitable index. Table 6 shows which predicate types are key-matching.

If a predicate fails to match the index, it may still be applied to the index, but not used to search it.

Only one predicate per column can be chosen as the key, however, other predicates on that column are eligible to be a sargable predicate. For predicates that are joined with AND, one per column is chosen as the key. The one with the best filter factor establishes the path and the other is turned into a sargable predicate. For example, consider a table with three columns C1, C2, and C3. A multi-column index is created (C3, C2, C1) and the following WHERE clause is used in a SELECT statement:

    ...WHERE C1>1 AND
           C1<2 AND
           C2=2 AND
           C3=3

Only the first, third, and fourth predicates are chosen as key-matching predicates (C1>1, C2=2, C3=3). The second predicate (C1<2) is not chosen as key-matching but it is sargable.

For maximum efficiency, use key-matching predicates and create suitable indexes. The database manager may not always use an index to apply a key-matching predicate--other factors may intervene. But the first step in reducing the processing cost of a query is to use key-matching predicates where possible and then create suitable indexes.

In general, when you create an multi-column index, put the column with the most distinct values first, and continue in order to the least distinct values.

One exception to this rule is the case where the index provides a necessary ordering of the data. With this query:

   SELECT * FROM EMPLOYEE
     WHERE EDLEVEL > 14 AND JOB = 'CLERK'
       ORDER BY EDLEVEL

an index on EDLEVEL,JOB enables the database manager to access data in the order required by the ORDER BY clause, thus saving a sort at the end. This may be enough to justify scanning index entries for rows that are rejected.

Suitable Index for Key-matching Predicate

For a simple predicate, an index is fully matched if the column in the predicate is the first column of the index. For example, the predicate C1=10 matches an index on columns C1, C2, C3, as well as an index on column C1 alone. If there are additional predicates on columns C2 and C3, they may also be evaluated through the multicolumn index.

For a search condition where all the predicates are joined by an AND, it is enough if the index includes the set of columns as an initial substring. For example, an index on columns C1, C3, C4, and C6 is fully matched by the search condition C1=10 AND C4='A' AND C3=7 AND C6=9, as long as all but the last column are matched with equality predicates. The last predicate can be either an equality or a range predicate.

The same index is not fully matched by the search condition C1=10 AND C4='A' AND C6=9, because the set of columns in that search condition (C1,C4,C6) is not an initial substring of C1, C3, C4, C6. However, the database manager can use the index for the parts of the search condition that do form an initial substring; in the example, it can apply the predicate C1=10 through the index. In addition, it can still use the index to evaluate the predicates on C4 and C6, so that data pages do not need to be accessed.

Similarly, such an index is not fully matched by the search condition C1=10 AND C3=7 AND C4>'A' AND C6=9, because the predicate C4>'A' is not an equality predicate. An index can only be matched up to and including the first non-equality predicate. Thus, the database manager can apply the predicates C1=10 AND C3=7 AND C4>'A' as key-matching predicates to the index. Again, the predicate on C6 can be evaluated as an index sarg so that data pages do not need to be accessed.

Hence, the order of the index columns is important; it should take into account the kinds of queries used. For example, suppose the Spiffy Computer department intends to query its employee table regularly with predicates such as EDLEVEL > 14 AND JOB = 'CLERK'. With an index on EDLEVEL,JOB, the database manager finds the first index entry with EDLEVEL greater than 14 and scans the remainder of the index from there upward. But with an index on JOB,EDLEVEL, it scans only the entries for clerks having EDLEVEL > 14, giving a shorter access path.
Note:If you created an index in order to improve the performance of an SQL statement, you should probably check that the database manager actually uses the index for that statement. To find out what access and processing methods it has chosen, use the EXPLAIN statement (see Using Explanation Tables to Evaluate Performance).

Sargable and Residual Predicates

Rows that are retrieved go through two stages of processing. Predicates can be applied at the first stage are called sargable predicates; those that cannot be applied until the second stage are called residual predicates. Predicates in the HAVING clause are always residual. Resolution of predicates and the predicate hierarchy are detailed in Predicate Processing. Table 6 shows which predicates are sargable and which are not.

There is a definite performance advantage in using sargable predicates: they require fewer CPU instructions than do residual predicates, because they eliminate rows that would otherwise be passed from first to second stage processing. Thus, whenever possible, avoid a residual predicate by rewriting your SQL statement.

Example

Table T contains 1000 rows, and column C6 contains the integers from 1 to 1000. Consider this query:

   SELECT * FROM T WHERE INTEGER(C6/7) = 2

Because the column in the predicate is involved in an arithmetic expression, the predicate is residual. The first stage must access 1000 rows and pass them all back to the second stage. If you instead write the predicate as WHERE C6 BETWEEN 14 AND 20, then only seven rows are passed back to the second stage. Furthermore, the predicate C6 BETWEEN 14 AND 20 is key-matching. If there is an index on C6, the first stage need only access seven rows.

Sargable predicates are better than residual predicates, but a suitable index is better still. Avoiding the processing cost of a residual predicate won't help you much if you have to access ten million rows without an index. (You can use EXPLAIN to tell whether a sargable predicate exists for a particular column, refer to the REFERENCE EXPLAIN table in the DB2 Server for VSE & VM SQL Reference manual.)

Join Predicates

In general, any predicate involving more than one table is a join predicate. In the database manager, a condition of the form T1.C1=T2.C2 (the equijoin) is handled specially by the optimizer. For information on joins, see Methods of Joining Two or More Tables.

Search Conditions and Their Processing Characteristics

Table 6 shows the different types of search conditions, and their processing characteristics. The following conventions are used:

Note:If the predicate falls in two different categories, choose the more specific category.

Table 6. Search Conditions and Their Processing Characteristics
Search Conditions Key- Matching? Sargable? Default Filter Factor (FF) Notes

COL = value
COL IS NULL
COL rop value
COL BETWEEN value1 AND value2
COL LIKE 'char'
COL IN (value1,...)


Yes
Yes
Yes
Yes
Yes
Yes


Yes
Yes
Yes
Yes
Yes
Yes


1/25
1/25
1/3
1/10
1/10
1/25*size


1,16
2
1,16,19
1,4,16,19
5,19
1,3,6,16,17


COL <> value
COL IS NOT NULL
COL NOT BETWEEN value1 AND value2
COL NOT IN (value1,...)
COL NOT LIKE 'value'
COL LIKE 'pattern'
COL LIKE host variable


No
No
No
No
No
No
No


Yes
Yes
No
Yes
No
No
No


24/25
24/25
9/10
1 - 1/25*size
9/10
1/10
1/10


1,7,16
2
7,8,19
1,7,9,16,17
19
19
16,19


T1.COL = T2.COL (different tables)
T1.COL rop T2.COL (different tables)
T1.COL <> T2.COL (different tables)


Yes
Yes
No


Yes
Yes
Yes


1/25
1/3
24/25


1,15
1,15,19
1,15


T1.COL1 = T1.COL2 (same table)
T1.COL1 rop T1.COL2 (same table)
T1.COL1 <> T1.COL2 (same table)


No
No
No


No
No
No


1/25
1/3
24/25


 
19
 


COL = [Q] (uncorrelated scalar subquery)
COL rop [Q] (uncorrelated scalar subquery)
COL <> [Q] (uncorrelated scalar subquery)


Yes
Yes
No


Yes
Yes
Yes


1/25
1/3
24/25


1,10
1,10,19
1,10


litexpr = [Q] (uncorrelated scalar subquery)
litexpr rop [Q] (uncorrelated scalar subquery)
litexpr <> [Q] (uncorrelated scalar subquery)


No
No
No


No
No
No


1
1/3
1


10
10,19
10


COL = (subquery)
COL <> (subquery)
COL rop (subquery)
litexpr = (subquery)
litexpr <> (subquery)
anyexpr op Q (subquery)


No
No
No
No
No
No


No
No
No
No
No
No


1/25
24/25
1/3
1
1
1


11
11
11,19
11
11
11


anyexpr [NOT] IN (subquery)
[NOT] EXISTS (subquery)


No
No


No
No


1
1


11
11


COL = expression
COL <> expression
COL rop expression
anyexpr NOT BETWEEN anyexpr AND anyexpr
anyexpr BETWEEN anyexpr AND anyexpr
anyexpr <> expression
litexpr rop anyexpr
litexpr = anyexpr
litexpr <> anyexpr
litexpr NOT IN (value,...)
litexpr IN (value,...)


No
No
No
No
No
No
No
No
No
No
No


No
No
No
No
No
No
No
No
No
No
No


1/25
24/25
1/3
9/10
1/10
24/25
1/3
1
&similar.0
&similar.0
1


12
12
12,19
12,18,19
12,18,19
12
12
12
12
12,17
12,17


search condition AND search condition
search condition OR search condition
NOT search condition


Yes
No
No


Yes
Yes
No


FF1*FF2
FF1+FF2-FF1*FF2
1-FF


13
14
 

Notes to Table 6: Search conditions which are listed as key-matching or sargable are only potentially so; they may not be treated as such because of the following factors:

  1. The value must be of the same or compatible type as the column. Adhere to this rule whenever possible. Numeric data types have the following hierarchy:
       SMALLINT < INTEGER < DECIMAL < FLOAT
    

    A value's data type can be converted to any higher data type. For example, INTEGER can be converted to a DECIMAL (given sufficient precision) or FLOAT, but not to SMALLINT. Similar compatibility considerations exist for character and graphic data lengths, as well as for the precision and scale of decimal data.

    If the data type of the column is CHAR(n) or GRAPHIC(n), the predicate that references it is sargable if the length of that predicate value is less than or equal to "n". The column and the predicate should also have the same CCSID, refer to Impact of CCSIDs on Sargability.

    If the data type of the column is DECIMAL(m,n), it must be possible to accommodate the number of decimal digits before and after the decimal point in the target decimal field (P1-S1 <= P2-S2 and S1<=S2 where P is precision and S is scale).

    Even-precision DECIMAL variables are supported by the DB2 Server for VSE & VM product for the assembler preprocessor. You can use even-precision DECIMAL columns in tables that are referenced by Assembler programs. Host variables in even-precision will be left as is by the preprocessor. Therefore, when these programs access even-precision DECIMAL data, predicates become sargable instead of RESIDUAL. Your performance may improve when using SQL statements that use even-precision packed DECIMAL columns.

  2. The NULL predicate must be applied to a column without the NOT NULL attribute in order to be key-matching or sargable. Otherwise, the predicate is residual.
  3. If a multicolumn index exists, at most one IN predicate can be used to match columns of the index. For example, if a table T1(C1, C2, C3) and an index C1, C2, C3 exist, the following query will have only one key-matching predicate, not three:
    SELECT * from T1 where C1 IN (:HV1, :HV2) AND C2 IN (:HV3, :HV4) AND C3=5
    
  4. If value1 and value2 are equal, then for filter factor calculation purposes the predicate is treated as though it were the equality predicate COL = value1.
  5. Although the LIKE predicate is a residual predicate, the database manager takes advantage of the character argument to generate a BETWEEN predicate which is both key-matching and sargable. This BETWEEN predicate is then applied by the first stage either as a key-matching or a sargable predicate. This transformation does not apply if the pattern is a host variable or the ESCAPE clause exists. It also does not apply if the character data is mixed data.
  6. If there are no host variables in the list, then a BETWEEN predicate will be generated using the lowest and highest values. This predicate is sargable, and can be used to reduce the number of rows returned to the second stage.
  7. Whenever possible, avoid negating a predicate using NOT. Instead, use an equivalent form that distributes the negation. In some cases, the database manager will perform this transformation for you. For example, the predicate NOT COL = value is treated like COL ¬= value.
  8. This predicate, although residual as stated, can be rewritten to eliminate the NOT BETWEEN into COL < value1 OR COL > value2, which is sargable. (See note 14.)
  9. Because this predicate is residual when more than one value is used, it might be beneficial to rewrite it as COL ¬= value1 AND COL ¬= value2 AND ... which is sargable.
  10. An uncorrelated scalar subquery can return at most one value, and can be evaluated before the query that contains it. This returned value is then used to replace the subquery. The predicate is scalar only if the subquery statement specifies a COLUMN function and the subquery does not contain a GROUP BY clause, or if the predicate containing the subquery is not quantified.
  11. Predicates that reference correlated subqueries or subqueries that can return more than one row are always residual.
  12. An expression makes any predicate residual. Sometimes a query can be rewritten to avoid the presence of an expression. For example, instead of "SALARY+200 = 20000", write "SALARY = 198000". The second form is executed more efficiently.
  13. For this kind of search condition to be key-matching, all predicates must refer to columns that form an initial substring of the index columns. All but the last column must be matched with equality predicates; the last predicate can be either an equality or a range predicate.

    In search conditions containing multiple predicates on the same column, only one predicate can be chosen as the key. The predicate providing the best filtering establishes the path, and the others are turned into sargable predicates.

  14. All predicates in a search condition that contains an OR remain sargable only if all the individual predicates are sargable; otherwise, they are all treated as residual. In other words, a single residual OR will cause all the predicates in a search condition to be residual.

    If all the predicates refer to the same column and the column is indexed, the search condition can be rewritten using the IN predicate.

    For example, instead of:

       SELECT * FROM EMP_ACT
         WHERE ACTNO=90 OR ACTNO=100
    

    write:

       SELECT * FROM EMP_ACT
         WHERE ACTNO IN (90,100)
    

    If different columns are referenced and the columns are indexed, then a UNION may be a more efficient form of the query.

    In the following example, the database manager will have to examine all rows in the EMPLOYEE table to find those that satisfy the two predicates:

       SELECT * FROM EMPLOYEE
         WHERE JOB = 'CLERK'
         OR LASTNAME = 'JONES'
    

    The same request can be processed more efficiently if it is reformulated as the UNION of two SELECT statements:

       SELECT * FROM EMPLOYEE WHERE JOB = 'CLERK'
       UNION
       SELECT * FROM EMPLOYEE WHERE LASTNAME = 'JONES'
    
  15. A join is accomplished by first accessing the outer table and looking for rows that satisfy all predicates on that table only. For each such row, the inner table is then accessed to find all rows there that match that row's join column value. Because a specific value is being used, a join predicate of "colname = colname" becomes "colname = value". This is why join predicates can give selective access to a table, if the table is the inner table.

    A join predicate is only sargable if the data types of the two columns are identical (disregarding whether the columns support NULLS). If the data types are CHAR(n), VARCHAR(n), GRAPHIC(n), and VARGRAPHIC(n), the lengths must match. If they are DECIMAL(m,n), precision and scale must both match.

    The database manager path selection takes this into account when it decides which table should be accessed first.

  16. Predicates using indicator variables are sargable only if they meet the following criteria:

    Predicates using indicator variables which do not meet the above criteria are always residual.

  17. When the IN predicate contains only one value in the list, it is converted to an EQUAL predicate.
  18. All BETWEEN and NOT BETWEEN predicates that do not have a column as the first argument and values as the second and third arguments, are residual.
  19. For the following cases the default filter factor is determined from the COLCOUNT value in the SYSTEM.SYSCOLUMNS catalog table (refer to SYSTEM.SYSCOLUMNS):

Filter Factors

The objective of a predicate is to return to the user only those rows satisfying a particular search condition. Every predicate is treated like a filter that reduces the number of rows returned. The degree to which the predicate reduces the size of the answer set is the filter factor (FF). The filter factor is an estimate of the proportion of rows that remain after a predicate has "filtered out" the rows that do not satisfy it.

The filter factor is a value between 0 and 1. If it is 1, the whole table is selected, and the predicate has no filtering effect; if it is 0, no rows are returned.

The database manager estimates a filter factor for every predicate. If the predicate is either too complex (contains an expression), uses disjunction (OR), uses host variables, or if there are no statistics available for the columns it references, then a default filter factor is used. The defaults for various predicates are shown in Table 6.

Sometimes the optimizer will not use the index when a predicate uses host variables. This is because the optimizer is forced to make assumptions about the values that are not available when the statement is being preprocessed. In such cases, you can usually improve performance by executing the SQL statement dynamically with fixed values.

Table 7. Filter Factors
COLCOUNT Value Filter Factor for rop Filter Factor for LIKE, BETWEEN
>=100,000,000 1/10000 3/100000
>=10,000,000 1/3000 1/10000
>=1,000,000 1/1000 3/10,000
>=100,000 1/300 1/1000
>=10,000 1/100 3/1000
>=1000 1/30 1/100
>=100 1/10 3/100
<100 1/3 1/10
=-1 1/3 1/10

Examples of Predicate Processing

The following examples of predicates illustrate the general rules shown in Table 6. In each case, assume that there is an index on columns C1, C2, C3, C4 of the table.

WHERE C1 = 5 AND C2 = 7
Both predicates are sargable, and both can be applied as key-matching predicates to the index.

WHERE C1 = 5 AND C2 > 7
Both predicates are sargable, and both can be applied as key-matching predicates to the index.

WHERE C1 > 5 AND C2 = 7
Both predicates are sargable, but only the first can be applied as a key-matching predicate. Because all the predicates reference columns in the index, the second predicate will be applied as an index page predicate.

WHERE C1 > 5 OR C2 = 7
Both predicates are sargable, and the combination is sargable. The OR prevents the use of key-matching predicates. The index can not be used for a selective index scan. However, both predicates will be applied as index page predicates.

WHERE C1 IN ( subquery ) AND C2 = C1
Both predicates are residual. The index is not considered for a selective index scan, and both predicates are evaluated residually.

WHERE C1 = 5 AND C2 = 7 AND C3+5 = 7
Only the first two predicates are sargable and can be applied as key-matching predicates. The third predicate is residual. The index is considered for selective access. All rows satisfying those two predicates are passed to residual processing to evaluate the third predicate.

WHERE C1 = 5 OR C2 = 7 OR C3+5 = 7
The third predicate is residual; hence, the combination is residual. All three predicates are evaluated residually.

WHERE C1 = 5 OR (C2 = 7 AND C3 = C4)
The third predicate is residual, so, the combination of the second and third predicates (in parentheses) is also residual. Hence, the total combination is residual. All predicates are evaluated residually.

WHERE (C1 > 5 OR C2 = 7) AND C3 = C4
The combination of the first two predicates is sargable, but the OR prevents the use of key-matching predicates. The third predicate is residual. The index is not considered for a selective index scan, but the combined predicate (in parentheses) is sargable and will be applied as index page predicates. All rows satisfying those two predicates are passed to residual processing to evaluate the third predicate.

WHERE C1 > 5 AND C2 = 7 AND C5 = 8
All predicates are sargable, but only the first can be applied as a key-matching predicate. Because the remaining predicates reference columns in both the index and data pages, the remaining two predicates are applied as data page sargs.

Impact of CCSIDs on Sargability

The DB2 Server for VSE & VM SQL Reference manual lists the rules used to decide which operand will undergo Coded Character Set Identifier (CCSID) conversion in a comparison operation. These rules will help you to maintain sargability whenever possible. The rules were defined to ensure that a column operand will only undergo CCSID conversion if it is absolutely necessary.

Whenever a column operand is chosen to undergo CCSID conversion, the predicate becomes residual because CCSID conversion is performed from RDS, and not from DBSS. In most cases, this is a necessary consequence of using the CCSID support. In other cases, it can be avoided by understanding how the rules apply and by changing the application or the data.

Consider the search condition: COL = value (where value is a host variable). This search condition is normally sargable (see Table 6). The database manager attempts to keep this search condition sargable by always performing CCSID conversion on the host variable operand. There is a case, however, when the rules state that the column operand is the one that should be converted: when the subtype of the column is SBCS and the subtype of the host variable CCSID is mixed.

If it is possible that the host variable will contain mixed data, then the column operand must undergo CCSID conversion and the predicate must become residual. One way to make this predicate sargable is to set the column subtype to mixed. This may not always be possible or desirable, but you should consider this situation when setting the subtype of new columns.

If the host variable will never contain mixed data, then it is possible to make this predicate sargable by changing the subtype of the host variable from mixed to SBCS. To do this, change the default CCSID values on the application requester. Refer to the DB2 Server for VM System Administration or the DB2 Server for VSE System Administration manuals for information on how to change the CHARNAME setting for an application requester. This situation may not be possible or desirable, especially if graphic or mixed data is used elsewhere in the query.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]