DB2 Server for VSE & VM: Performance Tuning Handbook


Modifying Table Designs to Enhance Performance

The primary consideration for the performance of access to data in the database is the number of DASD input/output requests required to access the table rows. The indexing and clustering techniques discussed in the previous sections enhance data access performance by reducing DASD input/output requests without impacting your logical data (table) design. However, other techniques can be used, if you are willing to reconsider your logical table design.

Keeping Together Frequently Updated Columns: Keep frequently updated columns close together in the same row. This helps to reduce the amount of data that has to be logged because only the portion of the row from the first column updated to the last column updated is recorded in the log.

Storing Joins of Tables (Redundant Data): The evaluation of a join of two tables involves combining information from corresponding rows of the tables. Ideally, the corresponding rows are on the same page. However, such clustering of rows from separate tables is difficult to establish and maintain. Then, such clustering may not be in the best interest of query accesses to the individual tables. Assuming tables are clustered on different pages, multiple input/output requests are required to evaluate the join of corresponding rows of the two tables.

For example, suppose the PROJECT table is clustered on DEPTNO. Retrieving information about a department (a DEPARTMENT row) and its corresponding project name (PROJECT rows), involves an access to one page to get the DEPARTMENT row and another access to a different page to get the corresponding PROJECT rows.

If most of such joins are done just to pick up the DEPTNAME information out of the DEPARTMENT row, it may be worthwhile to store DEPTNAME in both tables. This eliminates the need to join the two tables to obtain department names (DEPTNAMEs) in retrieval of project names. This, in turn, eliminates accesses to the DEPARTMENT table pages. This, in effect, reduces database input/output requests by "storing the join" of the two tables.

You can use the ISQL INSERT with Subselect (Format 2) to combine tables.

The extreme case of "storing joins" is to replace both tables with the complete join (SELECT *) of the two tables. This is rarely cost effective.

The cost of storing joins is the DASD space consumed and the extra activity required to maintain the redundant data. Table 10 shows the cost of storing the DEPTNAME column in the PROJECT table.

Table 10. Cost of Storing DEPTNAME in the PROJECT Table
Cost Factor Approximate Cost
DASD Storage The average column length of the DEPTNAME values times the number of rows in PROJECT (approximately 20-bytes per PROJECT row) or approximately a 35% increase in the average row length for PROJECT
INSERT into PROJECT This requires the application to first access the PROJECT table to obtain the DEPTNAME for inclusion in the PROJECT row.
UPDATE of DEPTNAME This requires an update to the PROJECT table (but this is not a frequent operation).
DELETE of a PROJECT No extra cost.
DELETE of a DEPTNAME No extra cost.

The cost of storing the DEPTNAME information redundantly is not excessive when compared to the input/output cost for frequent selecting of department names with queries on project information. The cost of DASD storage looks high (about a 35% increase in the size of the PROJECT table). However, a dbspace page will still hold about 140 PROJECT rows. Because most departments have less than 140 project names, it is still reasonable to expect all project names for a department to be on the same page for most departments.

Storing a Logical Table as Two Tables: Another way to reduce the number of pages occupied by a table is to reduce the size of the table. There is not much you can do to make a table smaller than it really is; however, you can achieve a similar effect by separating frequently used columns from the infrequently used columns.

You can do this by splitting the table into two (or more) tables. One table would contain the frequently used columns and the other(s) would contain the infrequently used columns. You could then cluster the rows of frequently used columns on a fewer number of pages.

The cost of splitting a table is the overhead added to queries that need all columns (which, by definition, is infrequent). Splitting a table also produces redundant data. That is, both (all) tables would have to contain the necessary column(s) to support the join.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]