Search conditions contain predicates joined with AND, OR, and NOT. A predicate is a search condition in a WHERE or HAVING clause of an SQL statement. Examples include C1 = 10, C2 BETWEEN 10 AND 20, EXISTS(subquery), and C4 NOT LIKE 'A%'. Only those rows that satisfy a predicate are returned.
Predicates are resolved in one of two categories: residual or sargable. (Sargable is a term derived from the words "search argument".) Sargable predicates are applied at the Database Storage Subsystem (DBSS) level; residual predicates are applied at the Relational Data System (RDS) level.
Figure 14 shows the hierarchy of predicates.
Figure 14. Predicate Hierarchy
![]() |
Sargable predicates are further divided into two categories: those that use the index, and those that do not. The former are called either key-matching predicates or index page sargs. The latter are called data page sargs.
A key-matching predicate, which is applied directly to the index key, is created when the columns referenced in the predicate form an initial substring of an index on the table.
An index page sarg is resolved using the index page, but is not used to search the index key. It is created when the columns referenced in the predicate are not an initial substring of an index, but are contained in the index.
A data page sarg does not use the index, and always requires the data pages be read. It is created when the columns referenced in the predicate are not contained in the index.
The next sections deal with predicates in the WHERE clause. For these predicates, it is important that the data types and CCSIDs of any columns and literals match whenever possible. That is, numeric values should use the same representation, including the same precision and scale for DECIMAL values. Character and graphic values should have the same length. Columns and literals should use the same CCSID, refer to Impact of CCSIDs on Sargability. Adhering to this rule will always give the database manager the greatest flexibility in choosing an efficient access path. All the examples assume that this rule has been followed. For more information, refer to note number one on page 1.
Before a predicate can be considered key-matching it must be in the correct form and a suitable index must be available.
Some types of predicates can match index entries; other types cannot. For example, if the EMPLOYEE table has an index on the column SEX, it matches the predicate in this query:
SELECT * FROM EMPLOYEE WHERE SEX = 'M'
On the other hand, the same index does not match the predicate in this query:
SELECT * FROM EMPLOYEE WHERE SEX < > 'F'
We call a predicate key-matching if it can match the entries in a suitable index. Table 6 shows which predicate types are key-matching.
If a predicate fails to match the index, it may still be applied to the index, but not used to search it.
Only one predicate per column can be chosen as the key, however, other predicates on that column are eligible to be a sargable predicate. For predicates that are joined with AND, one per column is chosen as the key. The one with the best filter factor establishes the path and the other is turned into a sargable predicate. For example, consider a table with three columns C1, C2, and C3. A multi-column index is created (C3, C2, C1) and the following WHERE clause is used in a SELECT statement:
...WHERE C1>1 AND C1<2 AND C2=2 AND C3=3
Only the first, third, and fourth predicates are chosen as key-matching predicates (C1>1, C2=2, C3=3). The second predicate (C1<2) is not chosen as key-matching but it is sargable.
For maximum efficiency, use key-matching predicates and create suitable indexes. The database manager may not always use an index to apply a key-matching predicate--other factors may intervene. But the first step in reducing the processing cost of a query is to use key-matching predicates where possible and then create suitable indexes.
In general, when you create an multi-column index, put the column with the most distinct values first, and continue in order to the least distinct values.
One exception to this rule is the case where the index provides a necessary ordering of the data. With this query:
SELECT * FROM EMPLOYEE WHERE EDLEVEL > 14 AND JOB = 'CLERK' ORDER BY EDLEVEL
an index on EDLEVEL,JOB enables the database manager to access data in the order required by the ORDER BY clause, thus saving a sort at the end. This may be enough to justify scanning index entries for rows that are rejected.
For a simple predicate, an index is fully matched if the column in the predicate is the first column of the index. For example, the predicate C1=10 matches an index on columns C1, C2, C3, as well as an index on column C1 alone. If there are additional predicates on columns C2 and C3, they may also be evaluated through the multicolumn index.
For a search condition where all the predicates are joined by an AND, it is enough if the index includes the set of columns as an initial substring. For example, an index on columns C1, C3, C4, and C6 is fully matched by the search condition C1=10 AND C4='A' AND C3=7 AND C6=9, as long as all but the last column are matched with equality predicates. The last predicate can be either an equality or a range predicate.
The same index is not fully matched by the search condition C1=10 AND C4='A' AND C6=9, because the set of columns in that search condition (C1,C4,C6) is not an initial substring of C1, C3, C4, C6. However, the database manager can use the index for the parts of the search condition that do form an initial substring; in the example, it can apply the predicate C1=10 through the index. In addition, it can still use the index to evaluate the predicates on C4 and C6, so that data pages do not need to be accessed.
Similarly, such an index is not fully matched by the search condition C1=10 AND C3=7 AND C4>'A' AND C6=9, because the predicate C4>'A' is not an equality predicate. An index can only be matched up to and including the first non-equality predicate. Thus, the database manager can apply the predicates C1=10 AND C3=7 AND C4>'A' as key-matching predicates to the index. Again, the predicate on C6 can be evaluated as an index sarg so that data pages do not need to be accessed.
Hence, the order of the index columns is important; it should take into account the kinds of queries used. For example, suppose the Spiffy Computer department intends to query its employee table regularly with predicates such as EDLEVEL > 14 AND JOB = 'CLERK'. With an index on EDLEVEL,JOB, the database manager finds the first index entry with EDLEVEL greater than 14 and scans the remainder of the index from there upward. But with an index on JOB,EDLEVEL, it scans only the entries for clerks having EDLEVEL > 14, giving a shorter access path.
Note: | If you created an index in order to improve the performance of an SQL statement, you should probably check that the database manager actually uses the index for that statement. To find out what access and processing methods it has chosen, use the EXPLAIN statement (see Using Explanation Tables to Evaluate Performance). |
Rows that are retrieved go through two stages of processing. Predicates can be applied at the first stage are called sargable predicates; those that cannot be applied until the second stage are called residual predicates. Predicates in the HAVING clause are always residual. Resolution of predicates and the predicate hierarchy are detailed in Predicate Processing. Table 6 shows which predicates are sargable and which are not.
There is a definite performance advantage in using sargable predicates: they require fewer CPU instructions than do residual predicates, because they eliminate rows that would otherwise be passed from first to second stage processing. Thus, whenever possible, avoid a residual predicate by rewriting your SQL statement.
Table T contains 1000 rows, and column C6 contains the integers from 1 to 1000. Consider this query:
SELECT * FROM T WHERE INTEGER(C6/7) = 2
Because the column in the predicate is involved in an arithmetic expression, the predicate is residual. The first stage must access 1000 rows and pass them all back to the second stage. If you instead write the predicate as WHERE C6 BETWEEN 14 AND 20, then only seven rows are passed back to the second stage. Furthermore, the predicate C6 BETWEEN 14 AND 20 is key-matching. If there is an index on C6, the first stage need only access seven rows.
Sargable predicates are better than residual predicates, but a suitable index is better still. Avoiding the processing cost of a residual predicate won't help you much if you have to access ten million rows without an index. (You can use EXPLAIN to tell whether a sargable predicate exists for a particular column, refer to the REFERENCE EXPLAIN table in the DB2 Server for VSE & VM SQL Reference manual.)
In general, any predicate involving more than one table is a join predicate. In the database manager, a condition of the form T1.C1=T2.C2 (the equijoin) is handled specially by the optimizer. For information on joins, see Methods of Joining Two or More Tables.
Table 6 shows the different types of search conditions, and their processing characteristics. The following conventions are used:
Note: | If the predicate falls in two different categories, choose the more specific category. |
Table 6. Search Conditions and Their Processing Characteristics
Search Conditions | Key- Matching? | Sargable? | Default Filter Factor (FF) | Notes |
---|---|---|---|---|
COL = value COL IS NULL COL rop value COL BETWEEN value1 AND value2 COL LIKE 'char' COL IN (value1,...) |
Yes Yes Yes Yes Yes Yes |
Yes Yes Yes Yes Yes Yes |
1/25 1/25 1/3 1/10 1/10 1/25*size |
1,16 2 1,16,19 1,4,16,19 5,19 1,3,6,16,17 |
COL <> value COL IS NOT NULL COL NOT BETWEEN value1 AND value2 COL NOT IN (value1,...) COL NOT LIKE 'value' COL LIKE 'pattern' COL LIKE host variable |
No No No No No No No |
Yes Yes No Yes No No No |
24/25 24/25 9/10 1 - 1/25*size 9/10 1/10 1/10 |
1,7,16 2 7,8,19 1,7,9,16,17 19 19 16,19 |
T1.COL = T2.COL (different tables) T1.COL rop T2.COL (different tables) T1.COL <> T2.COL (different tables) |
Yes Yes No |
Yes Yes Yes |
1/25 1/3 24/25 |
1,15 1,15,19 1,15 |
T1.COL1 = T1.COL2 (same table) T1.COL1 rop T1.COL2 (same table) T1.COL1 <> T1.COL2 (same table) |
No No No |
No No No |
1/25 1/3 24/25 |
19 |
COL = [Q] (uncorrelated scalar subquery) COL rop [Q] (uncorrelated scalar subquery) COL <> [Q] (uncorrelated scalar subquery) |
Yes Yes No |
Yes Yes Yes |
1/25 1/3 24/25 |
1,10 1,10,19 1,10 |
litexpr = [Q] (uncorrelated scalar subquery) litexpr rop [Q] (uncorrelated scalar subquery) litexpr <> [Q] (uncorrelated scalar subquery) |
No No No |
No No No |
1 1/3 1 |
10 10,19 10 |
COL = (subquery) COL <> (subquery) COL rop (subquery) litexpr = (subquery) litexpr <> (subquery) anyexpr op Q (subquery) |
No No No No No No |
No No No No No No |
1/25 24/25 1/3 1 1 1 |
11 11 11,19 11 11 11 |
anyexpr [NOT] IN (subquery) [NOT] EXISTS (subquery) |
No No |
No No |
1 1 |
11 11 |
COL = expression COL <> expression COL rop expression anyexpr NOT BETWEEN anyexpr AND anyexpr anyexpr BETWEEN anyexpr AND anyexpr anyexpr <> expression litexpr rop anyexpr litexpr = anyexpr litexpr <> anyexpr litexpr NOT IN (value,...) litexpr IN (value,...) |
No No No No No No No No No No No |
No No No No No No No No No No No |
1/25 24/25 1/3 9/10 1/10 24/25 1/3 1 &similar.0 &similar.0 1 |
12 12 12,19 12,18,19 12,18,19 12 12 12 12 12,17 12,17 |
search condition AND search condition search condition OR search condition NOT search condition |
Yes No No |
Yes Yes No |
FF1*FF2 FF1+FF2-FF1*FF2 1-FF |
13 14 |
Notes to Table 6: Search conditions which are listed as key-matching or sargable are only potentially so; they may not be treated as such because of the following factors:
SMALLINT < INTEGER < DECIMAL < FLOAT
A value's data type can be converted to any higher data type. For example, INTEGER can be converted to a DECIMAL (given sufficient precision) or FLOAT, but not to SMALLINT. Similar compatibility considerations exist for character and graphic data lengths, as well as for the precision and scale of decimal data.
If the data type of the column is CHAR(n) or GRAPHIC(n), the predicate that references it is sargable if the length of that predicate value is less than or equal to "n". The column and the predicate should also have the same CCSID, refer to Impact of CCSIDs on Sargability.
If the data type of the column is DECIMAL(m,n), it must be possible to accommodate the number of decimal digits before and after the decimal point in the target decimal field (P1-S1 <= P2-S2 and S1<=S2 where P is precision and S is scale).
Even-precision DECIMAL variables are supported by the DB2 Server for VSE & VM product for the assembler preprocessor. You can use even-precision DECIMAL columns in tables that are referenced by Assembler programs. Host variables in even-precision will be left as is by the preprocessor. Therefore, when these programs access even-precision DECIMAL data, predicates become sargable instead of RESIDUAL. Your performance may improve when using SQL statements that use even-precision packed DECIMAL columns.
SELECT * from T1 where C1 IN (:HV1, :HV2) AND C2 IN (:HV3, :HV4) AND C3=5
In search conditions containing multiple predicates on the same column, only one predicate can be chosen as the key. The predicate providing the best filtering establishes the path, and the others are turned into sargable predicates.
If all the predicates refer to the same column and the column is indexed, the search condition can be rewritten using the IN predicate.
For example, instead of:
SELECT * FROM EMP_ACT WHERE ACTNO=90 OR ACTNO=100
write:
SELECT * FROM EMP_ACT WHERE ACTNO IN (90,100)
If different columns are referenced and the columns are indexed, then a UNION may be a more efficient form of the query.
In the following example, the database manager will have to examine all rows in the EMPLOYEE table to find those that satisfy the two predicates:
SELECT * FROM EMPLOYEE WHERE JOB = 'CLERK' OR LASTNAME = 'JONES'
The same request can be processed more efficiently if it is reformulated as the UNION of two SELECT statements:
SELECT * FROM EMPLOYEE WHERE JOB = 'CLERK' UNION SELECT * FROM EMPLOYEE WHERE LASTNAME = 'JONES'
A join predicate is only sargable if the data types of the two columns are identical (disregarding whether the columns support NULLS). If the data types are CHAR(n), VARCHAR(n), GRAPHIC(n), and VARGRAPHIC(n), the lengths must match. If they are DECIMAL(m,n), precision and scale must both match.
The database manager path selection takes this into account when it decides which table should be accessed first.
Predicates using indicator variables which do not meet the above criteria are always residual.
The objective of a predicate is to return to the user only those rows satisfying a particular search condition. Every predicate is treated like a filter that reduces the number of rows returned. The degree to which the predicate reduces the size of the answer set is the filter factor (FF). The filter factor is an estimate of the proportion of rows that remain after a predicate has "filtered out" the rows that do not satisfy it.
The filter factor is a value between 0 and 1. If it is 1, the whole table is selected, and the predicate has no filtering effect; if it is 0, no rows are returned.
The database manager estimates a filter factor for every predicate. If the predicate is either too complex (contains an expression), uses disjunction (OR), uses host variables, or if there are no statistics available for the columns it references, then a default filter factor is used. The defaults for various predicates are shown in Table 6.
Sometimes the optimizer will not use the index when a predicate uses host
variables. This is because the optimizer is forced to make assumptions
about the values that are not available when the statement is being
preprocessed. In such cases, you can usually improve performance by
executing the SQL statement dynamically with fixed values.
COLCOUNT Value | Filter Factor for rop | Filter Factor for LIKE, BETWEEN |
---|---|---|
>=100,000,000 | 1/10000 | 3/100000 |
>=10,000,000 | 1/3000 | 1/10000 |
>=1,000,000 | 1/1000 | 3/10,000 |
>=100,000 | 1/300 | 1/1000 |
>=10,000 | 1/100 | 3/1000 |
>=1000 | 1/30 | 1/100 |
>=100 | 1/10 | 3/100 |
<100 | 1/3 | 1/10 |
=-1 | 1/3 | 1/10 |
The following examples of predicates illustrate the general rules shown in Table 6. In each case, assume that there is an index on columns C1, C2, C3, C4 of the table.
The DB2 Server for VSE & VM SQL Reference manual lists the rules used to decide which operand will undergo Coded Character Set Identifier (CCSID) conversion in a comparison operation. These rules will help you to maintain sargability whenever possible. The rules were defined to ensure that a column operand will only undergo CCSID conversion if it is absolutely necessary.
Whenever a column operand is chosen to undergo CCSID conversion, the predicate becomes residual because CCSID conversion is performed from RDS, and not from DBSS. In most cases, this is a necessary consequence of using the CCSID support. In other cases, it can be avoided by understanding how the rules apply and by changing the application or the data.
Consider the search condition: COL = value (where value is a host variable). This search condition is normally sargable (see Table 6). The database manager attempts to keep this search condition sargable by always performing CCSID conversion on the host variable operand. There is a case, however, when the rules state that the column operand is the one that should be converted: when the subtype of the column is SBCS and the subtype of the host variable CCSID is mixed.
If it is possible that the host variable will contain mixed data, then the column operand must undergo CCSID conversion and the predicate must become residual. One way to make this predicate sargable is to set the column subtype to mixed. This may not always be possible or desirable, but you should consider this situation when setting the subtype of new columns.
If the host variable will never contain mixed data, then it is possible to make this predicate sargable by changing the subtype of the host variable from mixed to SBCS. To do this, change the default CCSID values on the application requester. Refer to the DB2 Server for VM System Administration or the DB2 Server for VSE System Administration manuals for information on how to change the CHARNAME setting for an application requester. This situation may not be possible or desirable, especially if graphic or mixed data is used elsewhere in the query.