The load utility is capable of efficiently moving large quantities of data into newly created tables, or into tables that already contain data. The utility can handle all data types, including large objects (LOBs) and user-defined types (UDTs). The load utility is faster than the import utility, because it writes formatted pages directly into the database, while the import utility performs SQL INSERTs. The load utility does not fire triggers, and does not perform referential or table constraints checking (other than validating the uniqueness of the indexes). The data being loaded must be local to the server (unlike import and export, which support the passing of data from the client). For a detailed comparison of the load and the import utilities, see Appendix B. Differences Between the Import and the Load Utility.
The load process consists of three distinct phases (see Figure 1):
During the load phase, data is loaded into the table, and index keys and table statistics are collected, if necessary. Save points, or points of consistency, are established at intervals specified through the SAVECOUNT parameter in the LOAD command. Messages are generated, indicating how many input rows were successfully loaded at the time of the save point. For DATALINK columns defined with FILE LINK CONTROL, link operations are performed for non-NULL column values. If a failure occurs, you can restart the load operation; the RESTART option automatically restarts the load operation from the last successful consistency point. The TERMINATE option rolls back the failed load operation.
Figure 1. The Three Phases of the Load Process: Load, Build, and Delete.. Associated table spaces are in load pending state from the beginning of the load phase until the end of the build phase, and in delete pending state from the end of the build phase until the end of the delete phase.
During the build phase, indexes are created based on the index keys collected during the load phase. The index keys are sorted during the load phase, and index statistics are collected (if the STATISTICS YES with INDEXES option was specified). The statistics are similar to those collected through the RUNSTATS command (see the Command Reference). If a failure occurs during the build phase, the RESTART option automatically restarts the load operation at the appropriate point.
Unique key violations are placed into the exception table, if one was specified (see Exception Table), and messages about rejected rows are written to the message file. Following the completion of the load process, review these messages, resolve any problems, and insert corrected rows into the table.
Do not attempt to delete or to modify any temporary files created by the load utility. Some temporary files are critical to the delete phase. If a failure occurs during the delete phase, the RESTART option automatically restarts the load operation at the appropriate point.
Note: | Each deletion event is logged. If you have a large number of records that violate the uniqueness condition, the log could fill up during the delete phase. |
The following information is required when loading data:
You can also specify:
If you specify the pagefreespace modifier, and you have an index on the table, you might consider specifying indexfreespace. When deciding on the amount of free space to leave for each, consider that the size of each row being inserted into the table will likely be larger than the size of the associated key to be inserted into the index. In addition, the page size of the table spaces for the table and the index may be different.
If data is appended to a table, statistics are not collected. To collect current statistics on an appended table, invoke the runstats utility following completion of the load process. If gathering statistics on a table with a unique index, and duplicate keys are deleted during the delete phase, statistics are not updated to account for the deleted records. If you expect to have a significant number of duplicate records, do not collect statistics during the load operation. Instead, invoke the runstats utility following completion of the load process.
Logging is required for fully recoverable databases. The load utility almost completely eliminates the logging associated with the loading of data. In place of logging, you have the option of making a copy of the loaded portion of the table. For information about how DB2 keeps tracks of the load copies, see Using the Load Copy Location File. If you have a database environment that allows for database recovery following a failure, you can do one of the following:
If you are loading a table that already contains data, and the database is non-recoverable, ensure that you have a backed-up copy of the database, or the table spaces for the table being loaded, before invoking the load utility, so that you can recover from errors.
If you want to perform a sequence of multiple load operations on a recoverable database, the sequence of operations will be faster if you specify each load operation to be non-recoverable, and take a backup at the end of the load sequence, than if you invoke each of the load operations with the COPY YES option. You can use the NONRECOVERABLE option to specify that a load transaction is to be marked as non-recoverable, and that it will not be possible to recover it by a subsequent roll forward action. The rollforward utility will skip the transaction, and will mark the table into which data was being loaded as "invalid". The utility will also ignore any subsequent transactions against that table. After the roll forward is completed, such a table can only be dropped (see Figure 2). With this option, table spaces are not put in backup pending state following the load operation, and a copy of the loaded data does not have to be made during the load operation.
Figure 2. Non-recoverable Processing During a Roll Forward Action
For more information, see the database recovery chapter in the Administration Guide.