In the subsection called "Example of Impact on Equality Predicates", there is a discussion of a predicate C <= 10. The error is stated as being -86%. This is incorrect. The sentence at the end of the paragraph should read:
Assuming a uniform data distribution and using formula (1), the number of rows that satisfy the predicate is estimated as 1, an error of -87.5%.
In the subsection called "Example of Impact on Equality Predicates", there is a discussion of a predicate C > 8.5 AND C <= 10. The estimate of the r_2 value using linear interpolation must be changed to the following:
10 - 8.5 r_2 *= ---------- x (number of rows with value > 8.5 and <= 100.0) 100 - 8.5 10 - 8.5 r_2 *= ---------- x (10 - 7) 100 - 8.5 1.5 r_2 *= ---- x (3) 91.5 r_2 *= 0
The paragraph following this new example must also be modified to read as follows:
The final estimate is r_1 + r_2 *= 7, and the error is only -12.5%.
|Within the section titled "Rules for Updating Column Statistics", |the last bulleted-list item in the first list item should be replaced by |the following:
|HIGH2KEY must be greater than LOW2KEY whenever there are more than 3 distinct |values in the corresponding column. In the case of 3 or less distinct values |in the column, HIGH2KEY can be equal to LOW2KEY.
In FixPak 1, an option is provided to collect and use sub-element statistics. These are statistics about the content of data in columns when the data has a structure in the form of a series of sub-fields or sub-elements delimited by blanks.
For example, suppose a database contains a table DOCUMENTS in which each row describes a document, and suppose that in DOCUMENTS there is a column called KEYWORDS containing a list of relevant keywords relating to this document for text retrieval purposes. The values in KEYWORDS might be as follows:
'database simulation analytical business intelligence' 'simulation model fruitfly reproduction temperature' 'forestry spruce soil erosion rainfall' 'forest temperature soil precipitation fire'
In this example, each column value consists of 5 sub-elements, each of which is a word (the keyword), separated from the others by one blank.
For queries that specify LIKE predicates on such columns using the % match_all character:
SELECT .... FROM DOCUMENTS WHERE KEYWORDS LIKE '%simulation%'
it is often beneficial for the optimizer to know some basic statistics about the sub-element structure of the column, namely:
In the KEYWORDS column example, SUB_COUNT is 5, and SUB_DELIM_LENGTH is 1, because each delimiter is a single blank character.
In FixPak 1, the system administrator controls the collection and use of these statistics by means of an extension to the DB2_LIKE_VARCHAR registry variable. This registry variable affects how the DB2 UDB optimizer deals with a predicate of the form:
COLUMN LIKE '%xxxxxx'
where xxxxxx is any string of characters; that is, any LIKE predicate whose search value starts with a % character. (It may or may not end with a % character). These are referred to as "wildcard LIKE predicates" below. For all predicates, the optimizer has to estimate how many rows match the predicate. For wildcard LIKE predicates, the optimizer assumes that the COLUMN being matched has a structure of a series of elements concatenated together to form the entire column, and estimates the length of each element based on the length of the string, excluding leading and trailing % characters. The new syntax is:
db2set DB2_LIKE_VARCHAR=[Y|N|S|num1][,Y|N|num2] where - the first term (preceding the comma) means the following, but only for columns that do not have positive sub-element statistics S Use the algorithm as used in DB2 Version 2. N Use a fixed-length sub-element algorithm. Y (default) Use a variable-length sub-element algorithm with a default value for the algorithm parameter. num1 Use a variable-length sub-element algorithm, and use num1 as the algorithm parameter. - the second term (following the comma) means: N (default) Do not collect or use sub-element statistics. Y Collect sub-element statistics. Use a variable-length sub-element algorithm that uses those statistics, together with a default value for the algorithm parameter in the case of columns with positive sub-element statistics. num2 Collect sub-element statistics. Use a variable-length sub-element algorithm that uses those statistics, together with num2 as the algorithm parameter in the case of columns with positive sub-element statistics.
If the value of DB2_LIKE_VARCHAR contains only the first term, no sub-element statistics are collected, and any that have previously been collected are ignored. The value specified affects how the optimizer calculates the selectivity of wildcard LIKE predicates in the same way as before; that is:
If the value of DB2_LIKE_VARCHAR contains two terms, and the second is Y or a floating point constant, sub-element statistics on single-byte character set string columns of type CHAR, VARCHAR, GRAPHIC, or VARGRAPHIC are collected during a RUNSTATS operation and used during compilation of queries involving wildcard LIKE predicates. The optimizer uses an algorithm that presumes the sub-element model and uses the SUB_COUNT and SUB_DELIM_LENGTH statistics, as well as an algorithm parameter, to calculate the selectivity of the predicate. The algorithm parameter is specified in the same way that the inferential algorithm is specified, that is:
If, during compilation, the optimizer finds that sub-element statistics have not been collected on the column involved in the query, it will use the "inferential" sub-element algorithm; that is, the one used when only the first term of DB2_LIKE_VARCHAR is specified. Thus, in order for the sub-element statistics to be used by the optimizer, the second term of DB2_LIKE_VARCHAR must be set both during RUNSTATS and compilation.
The values of the sub-element statistics can be viewed by querying SYSIBM.SYSCOLUMNS. For example:
select substr(NAME,1,16), SUB_COUNT, SUB_DELIM_LENGTH from sysibm.syscolumns where tbname = 'DOCUMENTS'
The SUB_COUNT and SUB_DELIM_LENGTH columns are not present in the SYSSTAT.COLUMNS statistics view, and therefore cannot be updated.