版次注意事項


8.4 Chapter 6. Understanding the SQL Compiler

The following sections require changes:

8.4.1 Replicated Summary Tables

The following information will replace or be added to the existing information already in this section:

Replicated summary tables can be used to assist in the collocation of joins. For example, if you had a star schema where there is a large fact table spread across twenty nodes, then the joins between the fact table and the dimension tables are most efficient if these tables are collocated.

By placing all of the tables in the same nodegroup, at most there would one dimension table partitioned correctly for a collocated join. All other dimension tables would not be able to be used in a collocated join because the join column(s) on the fact table would not correspond to the fact table's partitioning key.

For example, you could have a table called FACT (C1, C2, C3, ...) partitioned on C1; and a table called DIM1 (C1, dim1a, dim1b, ...) partitioned on C1; and a table called DIM2 (C2, dim2a, dim2b, ...) partitioned on C2; and so on.

From this example, you could see that the join between FACT and DIM1 is perfect because the predicate DIM1.C1 = FACT.C1 would be collocated. Both of these tables are partitioned on column C1.

The join between DIM2 with the predicate WHERE DIM2.C2 = FACT.C2 cannot be collocated because FACT is partitioned on column C1 and not on column C2.

In this case, it would be good to replicate DIM2 in the fact table's nodegroup. In this way we can do the join locally on each partition.

Note:
The replicated summary tables discussion here has to do with intra-database replication. Inter-database replication has to do with subscriptions, control tables, and data located in different databases and on different operating systems. If you are interested in inter-database replication refer to the Replication 指南與參考手冊 for more information.

When creating a replicated summary table, the source table could be a single-node nodegroup table or a multi-node nodegroup table. In most cases, the table is small and can be placed in a single-node nodegroup. You may place a limit on the data to be replicated by specifying only a subset of the columns from the table, or by limiting the number of rows through the predicates used, or by using both methods when creating the replicated summary table.

Note:
The data capture option is not required for replicated summary tables to function.

The replicated summary table could also be created in a multi-node nodegroup. The nodegroup is the same as the nodegroup in which you have placed your large tables. In this case, copies of the source table are created on all of the partitions of the nodegroup. Joins between a large fact table and the dimension tables have a better chance of being done locally in this environment rather than having to broadcast the source table to all partitions.

Indexes on replicated tables are not created automatically. Indexes are created and may be different from those identified in the source table.

Note:
You cannot create unique indexes (or put on any constraints) on the replicated tables. This will prevent constraint violations that are not present on the source tables. These constraints are disallowed even if there is the same constraint on the source table.

After using the REFRESH statement, you should run RUNSTATS on the replicated table as you would any other table.

The replicated tables can be referenced directly within a query. However, you cannot use the NODENUMBER() predicate with a replicated table to see the table data on a particular partition.

To see if a created replicated summary table was used (given a query that referenced the source table), you can use the EXPLAIN facility. First, you would ensure the EXPLAIN tables existed. Then, you would create an explain plan for the SELECT statement you are interested in. Finally, you would use db2exfmt utility to format the EXPLAIN output.

The access plan chosen by the optimizer may or may not use the replicated summary table depending on the information that needs to be joined. Not using the replicated summary table could occur if the optimizer determined that it would be cheaper to broadcast the original source table to the other partitions in the nodegroup.

8.4.2 Data Access Concepts and Optimization

The section "Multiple Index Access" under "Index Scan Concepts" has changed.

Add the following information before the note at the end of the section:

To realize the performance benefits of dynamic bitmaps when scanning multiple indexes, it may be necessary to change the value of the sort heap size (sortheap) database configuration parameter, and the sort heap threshold (sheapthres) database manager configuration parameter.

Additional sort heap space is required when dynamic bitmaps are used in access plans. When sheapthres is set to be relatively close to sortheap (that is, less than a factor of two or three times per concurrent query), dynamic bitmaps with multiple index access must work with much less memory than the optimizer anticipated.

The solution is to increase the value of sheapthres relative to sortheap.

The section "Search Strategies for Star Join" under "Predicate Terminology" has changed.

Add the following information at the end of the section:

The dynamic bitmaps created and used as part of the Star Join technique uses sort heap memory. See Chapter 13, "Configuring DB2" in the Administration Guide: Performance manual for more information on the Sort Heap Size (sortheap) database configuration parameter.


[ 頁面頂端 | 前一頁 | 下一頁 | 目錄 | 索引 ]