The load utility is capable of efficiently moving large quantities of data into newly created tables, or into tables that already contain data. The utility can handle all data types, including large objects (LOBs) and user-defined types (UDTs). The load utility is faster than the import utility, because it writes formatted pages directly into the database, while the import utility performs SQL INSERTs. The load utility does not fire triggers, and does not perform referential or table constraints checking (other than validating the uniqueness of the indexes). For a detailed comparison of the load and the import utilities, see Appendix B, Differences Between the Import and the Load Utility.
The load process consists of three distinct phases (see Figure 1):
During the load phase, data is loaded into the table, and index keys and table statistics are collected, if necessary. Save points, or points of consistency, are established at intervals specified through the SAVECOUNT parameter in the LOAD command. Messages are generated, indicating how many input rows were successfully loaded at the time of the save point. For DATALINK columns defined with FILE LINK CONTROL, link operations are performed for non-NULL column values. If a failure occurs, you can restart the load operation; the RESTART option automatically restarts the load operation from the last successful consistency point. The TERMINATE option rolls back the failed load operation.
Figure 1. The Three Phases of the Load Process: Load, Build, and Delete. Associated table spaces are in load pending state from the beginning of the load phase until the end of the build phase, and in delete pending state from the end of the build phase until the end of the delete phase.
During the build phase, indexes are created based on the index keys collected during the load phase. The index keys are sorted during the load phase, and index statistics are collected (if the STATISTICS YES with INDEXES option was specified). The statistics are similar to those collected through the RUNSTATS command (see the Command Reference). If a failure occurs during the build phase, the RESTART option automatically restarts the load operation at the appropriate point.
Do not attempt to delete or to modify any temporary files created by the load utility. Some temporary files are critical to the delete phase. If a failure occurs during the delete phase, the RESTART option automatically restarts the load operation at the appropriate point.
Note: | Each deletion event is logged. If you have a large number of records that violate the uniqueness condition, the log could fill up during the delete phase. |
The following information is required when loading data:
You can also specify:
If you specify the pagefreespace modifier, and you have an index on the table, you might consider specifying indexfreespace. When deciding on the amount of free space to leave for each, consider that the size of each row being inserted into the table will likely be larger than the size of the associated key to be inserted into the index. In addition, the page size of the table spaces for the table and the index may be different.
If data is appended to a table, statistics are not collected. To collect current statistics on an appended table, invoke the runstats utility following completion of the load process. If gathering statistics on a table with a unique index, and duplicate keys are deleted during the delete phase, statistics are not updated to account for the deleted records. If you expect to have a significant number of duplicate records, do not collect statistics during the load operation. Instead, invoke the runstats utility following completion of the load process.
Logging is required for fully recoverable databases. The load utility almost completely eliminates the logging associated with the loading of data. In place of logging, you have the option of making a copy of the loaded portion of the table. For information about how DB2 keeps tracks of the load copies, see Using the Load Copy Location File. If you have a database environment that allows for database recovery following a failure, you can do one of the following:
If you are loading a table that already contains data, and the database is non-recoverable, ensure that you have a backed-up copy of the database, or the table spaces for the table being loaded, before invoking the load utility, so that you can recover from errors.
If you want to perform a sequence of multiple load operations on a recoverable database, the sequence of operations will be faster if you specify each load operation to be non-recoverable, and take a backup at the end of the load sequence, than if you invoke each of the load operations with the COPY YES option. You can use the NONRECOVERABLE option to specify that a load transaction is to be marked as non-recoverable, and that it will not be possible to recover it by a subsequent roll forward action. The rollforward utility will skip the transaction, and will mark the table into which data was being loaded as "invalid". The utility will also ignore any subsequent transactions against that table. After the roll forward is completed, such a table can only be dropped (see Figure 2). With this option, table spaces are not put in backup pending state following the load operation, and a copy of the loaded data does not have to be made during the load operation.
Figure 2. Non-recoverable Processing During a Roll Forward Action
For more information, see the database recovery chapter in the Administration Guide.
The Version 6 and the Version 7 load utilities have full back-level compatibility with previous releases; that is, they will accept syntax from previous releases and operate normally. Following is a summary of syntax changes and changes to load behavior introduced in Version 6:
Sort spills that occur during load index creation are now performed inside a temporary table space. Sort operations do not spill directly to disk; rather, they spill to the bufferpool associated with the temporary table space. Having a large bufferpool associated with the temporary table space can improve index creation time. To achieve the same type of I/O parallelism that was available in pre-Version 6 load sort operations (by specifying multiple temporary sort directories), it is recommended that temporary table spaces be declared with multiple containers, each residing on a different disk device. It is also recommended that temporary table spaces be declared as SMS (System Managed Space), so that they can grow to accommodate large volumes of data without holding disk resources when those resources are not being used.
The Version 7 load utility can load data residing on a remotely connected client, in fully qualified files or named pipes. (Separate files containing LOB values, when the lobsinfile file type modifier is specified, should be located on the server.)