Administration Guide

How Data Is Redistributed in Tables

When doing data redistribution on a table, the utility does the following:

Locks the row for the table in the SYSTABLES catalog table.
Invalidates all packages that involve this table. The partitioning map ID associated with the table will change because the table is being redistributed. Because the packages are invalidated, the compiler must obtain the new partitioning information for the table and generate packages accordingly.
Locks the table in exclusive mode.
Redistributes the data in the table via DELETEs and INSERTs.
If the redistribution operation succeeds, it:
1. Issues a COMMIT for the table.
2. Continues with the next table in the nodegroup.
If the operation fails before the table is fully redistributed, the utility:
1. Issues a ROLLBACK on updates to the table.
2. Ends the entire redistribution operation and returns an error.

Estimating the log space requirements when distributing data is important. The log must be large enough to accommodate the INSERT and DELETE operations at each database partition where data is being redistributed. The heaviest logging requirements will be either on the database partition that will lose the most data, or on the database partition that will gain the most data. If you are moving to a larger number of database partitions, then the ratio of current database partitions to the new number of database partitions will assist in determining the number of INSERT and DELETE operations.

For example, if you are moving from four to five database partitions, approximately twenty percent of the four original database partitions will have data moved to the new database partition. This means that the four original database partitions will each experience twenty percent DELETE operations based on the total amount of the data at each database partition. The new database partition will experience all of the INSERT operations (that is, the equivalent of an equal number of the DELETE operations from all of the four original database partitions).

The above example assumes a uniform distribution of the data. There may also be a case where there is a non-uniform distribution of the data as in the case where there is a large number of NULL values in the partitioning key. In this case, all of these rows would end up on one database partition under the old partitioning scheme and on a different database partition under the new partitioning scheme. As a result, this can increase the amount of log space required on those two database partitions perhaps well beyond the amount calculated by assuming uniform distribution.

When doing the actual calculations, you must multiply the percentage of change (like twenty percent) by the size of the largest table. You do this because the redistribution of each table is accomplished as a single transaction.
Note: However, the largest table may be uniformly distributed but the second largest table (for example) may have one or more inflated database partitions. In such a case, you should consider using the second table and not the largest one.

Once you have calculated the maximum amount of data to be inserted and deleted at a database partition, double that figure to determine the peak size of the active log. If this exceeds the active log limit of 32 GB, then the data redistribution must be done in steps. There is a utility called "makepmap" that can be used to generate a series of target partition maps, one for each step.

[ Top of Page | Previous Page | Next Page ]