Release Notes


Administration Guide: Performance


4.1 Chapter 5. System Catalog Statistics

The following section requires a change:

4.1.1 Collecting and Using Distribution Statistics

In the subsection called "Example of Impact on Equality Predicates", there is a discussion of a predicate C <= 10. The error is stated as being -86%. This is incorrect. The sentence at the end of the paragraph should read:

   Assuming a uniform data distribution and using formula (1), the number of
   rows that satisfy the predicate is estimated as 1, an error of -87.5%.

In the subsection called "Example of Impact on Equality Predicates", there is a discussion of a predicate C > 8.5 AND C <= 10. The estimate of the r_2 value using linear interpolation must be changed to the following:

          10 - 8.5   
   r_2 *= ---------- x (number of rows with value > 8.5 and <= 100.0)
          100 - 8.5
          10 - 8.5   
   r_2 *= ---------- x (10 - 7)
          100 - 8.5
          1.5   
   r_2 *= ---- x (3)
          91.5
   r_2 *= 0

The paragraph following this new example must also be modified to read as follows:

   The final estimate is r_1 + r_2 *= 7,
   and the error is only -12.5%.

4.2 Chapter 6. Understanding the SQL Compiler

The following sections require changes:

4.2.1 Replicated Summary Tables

The following information will replace or be added to the existing information already in this section:

Replicated summary tables can be used to assist in the collocation of joins. For example, if you had a star schema where there is a large fact table spread across twenty nodes, then the joins between the fact table and the dimension tables are most efficient if these tables are collocated.

By placing all of the tables in the same nodegroup, at most there would one dimension table partitioned correctly for a collocated join. All other dimension tables would not be able to be used in a collocated join because the join column(s) on the fact table would not correspond to the fact table's partitioning key.

For example, you could have a table called FACT (C1, C2, C3, ...) partitioned on C1; and a table called DIM1 (C1, dim1a, dim1b, ...) partitioned on C1; and a table called DIM2 (C2, dim2a, dim2b, ...) partitioned on C2; and so on.

From this example, you could see that the join between FACT and DIM1 is perfect because the predicate DIM1.C1 = FACT.C1 would be collocated. Both of these tables are partitioned on column C1.

The join between DIM2 with the predicate WHERE DIM2.C2 = FACT.C2 cannot be collocated because FACT is partitioned on column C1 and not on column C2.

In this case, it would be good to replicate DIM2 in the fact table's nodegroup. In this way we can do the join locally on each partition.
Note:The replicated summary tables discussion here has to do with intra-database replication. Inter-database replication has to do with subscriptions, control tables, and data located in different databases and on different operating systems. If you are interested in inter-database replication refer to the Replication Guide and Reference for more information.

When creating a replicated summary table, the source table could be a single-node nodegroup table or a multi-node nodegroup table. In most cases, the table is small and can be placed in a single-node nodegroup. You may place a limit on the data to be replicated by specifying only a subset of the columns from the table, or by limiting the number of rows through the predicates used, or by using both methods when creating the replicated summary table.
Note:The data capture option is not required for replicated summary tables to function.

The replicated summary table could also be created in a multi-node nodegroup. The nodegroup is the same as the nodegroup in which you have placed your large tables. In this case, copies of the source table are created on all of the partitions of the nodegroup. Joins between a large fact table and the dimension tables have a better chance of being done locally in this environment rather than having to broadcast the source table to all partitions.

Indexes on replicated tables are not created automatically. Indexes are created and may be different from those identified in the source table.
Note:You cannot create unique indexes (or put on any constraints) on the replicated tables. This will prevent constraint violations that are not present on the source tables. These constraints are disallowed even if there is the same constraint on the source table.

After using the REFRESH statement, you should run RUNSTATS on the replicated table as you would any other table.

The replicated tables can be referenced directly within a query. However, you cannot use the NODENUMBER() predicate with a replicated table to see the table data on a particular partition.

To see if a created replicated summary table was used (given a query that referenced the source table), you can use the EXPLAIN facility. First, you would ensure the EXPLAIN tables existed. Then, you would create an explain plan for the SELECT statement you are interested in. Finally, you would use db2exfmt utility to format the EXPLAIN output.

The access plan chosen by the optimizer may or may not use the replicated summary table depending on the information that needs to be joined. Not using the replicated summary table could occur if the optimizer determined that it would be cheaper to broadcast the original source table to the other partitions in the nodegroup.

4.2.2 Data Access Concepts and Optimization

The section "Multiple Index Access" under "Index Scan Concepts" has changed.

Add the following information before the note at the end of the section:

To realize the performance benefits of dynamic bitmaps when scanning multiple indexes, it may be necessary to change the value of the sort heap size (sortheap) database configuration parameter, and the sort heap threshold (sheapthres) database manager configuration parameter.

Additional sort heap space is required when dynamic bitmaps are used in access plans. When sheapthres is set to be relatively close to sortheap (that is, less than a factor of two or three times per concurrent query), dynamic bitmaps with multiple index access must work with much less memory than the optimizer anticipated.

The solution is to increase the value of sheapthres relative to sortheap.

The section "Search Strategies for Star Join" under "Predicate Terminology" has changed.

Add the following information at the end of the section:

The dynamic bitmaps created and used as part of the Star Join technique uses sort heap memory. See Chapter 13, "Configuring DB2" in the Administration Guide: Performance manual for more information on the Sort Heap Size (sortheap) database configuration parameter.


4.3 Chapter 13. Configuring DB2

The following parameters require changes:

4.3.1 Sort Heap Size (sortheap)

The "Recommendation" section has changed. The information here should now read:

When working with the sort heap, you should consider the following:

4.3.2 Sort Heap Threshold (sheapthres)

The second last paragraph in the description of this parameter has changed. The paragraph should now read:

Examples of those operations that use the sort heap include: sorts, dynamic bitmaps (used for index ANDing and Star Joins), and operations where the table is in memory.

The following information is to be added to the description of this parameter:

There is no reason to increase the value of this parameter when moving from a single-node to a multi-node environment. Once you have tuned the database and database manager configuration parameters on a single node (in a DB2 EE) environment, the same values will in most cases work well in a multi-node (in a DB2 EEE) environment.

The Sort Heap Threshold parameter, as a database manager configuration parameter, applies across the entire DB2 instance. The only way to set this parameter to different values on different nodes or partitions, is to create more than one DB2 instance. This will require managing different DB2 databases over different nodegroups. Such an arrangement defeats the purpose of many of the advantages of a partitioned database environment.


4.4 Appendix A. DB2 Registry and Environment Variables

The following registry variables are new or require changes:

4.4.1 Table of New and Changed Registry Variables


Table 1. Registry Variables
Variable Name Operating System Values
Description
DB2MAXFSCRSEARCH All Default=5

Values: -1, 1 to 33554

Specifies the number of free space control records to search when adding a record to a table. The default is to search five free space control records. Modifying this value allows you to balance insert speed with space reuse. Use large values to optimize for space reuse. Use small values to optimize for insert speed. Setting the value to -1 forces the database manager to search all free space control records.
DLFM_TSM_MGMTCLASS AIX, Windows NT Default: the default TSM management class

Values: any valid TSM management class

Specifies which TSM management class to use to archive and retrieve linked files. If there is no value set for this variable, the default TSM management class is used.
DB2_CORRELATED_PREDICATES All Default=ON

Values: ON or OFF

The default for this variable is ON. When there are unique indexes on correlated columns in a join, and this registry variable is ON, the optimizer attempts to detect and compensate for correlation of join predicates. When this registry variable is ON, the optimizer uses the KEYCARD information of unique index statistics to detect cases of correlation, and dynamically adjusts the combined selectivities of the correlated predicates, thus obtaining a more accurate estimate of the join size and cost.
DB2_VI_DEVICE Windows NT Default=null

Values: nic0 or VINIC

Specifies the symbolic name of the device or Virtual Interface Provider Instance associated with the Network Interface Card (NIC). Independent hardware vendors (IHVs) each produce their own NIC. Only one (1) NIC is allowed per Windows NT machine; Multiple logical nodes on the same physical machine will share the same NIC. The symbolic device name "VINIC" must be in upper case and can only be used with Synfinity Interconnect. All other currently supported implementations use "nic0" as the symbolic device name.


4.5 Appendix C. SQL Explain Tools

The section titled "Running db2expln and dynexpln" should have the last paragraph replaced with the following:

To run db2expln, you must have SELECT privilege to the system catalog views as well as EXECUTE authority for the db2expln package. To run dynexpln, you must have BINDADD authority for the database, the schema you are using to connect to the database must exist or you must have the EXPLICIT_SCHEMA authority for the database, and you must have any privileges needed for the SQL statements being explained. (Note that if you have SYSADM or DBADM authority, you will automatically have all these authorization levels.)


[ Top of Page | Previous Page | Next Page | Table of Contents ]