Data Movement Utilities Guide and Reference

Load Overview

The load utility is capable of efficiently moving large quantities of data into newly created tables, or into tables that already contain data. The utility can handle all data types, including large objects (LOBs) and user-defined types (UDTs). The load utility is faster than the import utility, because it writes formatted pages directly into the database, while the import utility performs SQL INSERTs. The load utility does not fire triggers, and does not perform referential or table constraints checking (other than validating the uniqueness of the indexes). The data being loaded must be local to the server (unlike import and export, which support the passing of data from the client). For a detailed comparison of the load and the import utilities, see Appendix B. Differences Between the Import and the Load Utility.

The load process consists of three distinct phases (see Figure 1):

Load, during which the data is written to the table.
During the load phase, data is loaded into the table, and index keys and table statistics are collected, if necessary. Save points, or points of consistency, are established at intervals specified through the SAVECOUNT parameter in the LOAD command. Messages are generated, indicating how many input rows were successfully loaded at the time of the save point. For DATALINK columns defined with FILE LINK CONTROL, link operations are performed for non-NULL column values. If a failure occurs, you can restart the load operation; the RESTART option automatically restarts the load operation from the last successful consistency point. The TERMINATE option rolls back the failed load operation.
Figure 1. The Three Phases of the Load Process: Load, Build, and Delete.. Associated table spaces are in load pending state from the beginning of the load phase until the end of the build phase, and in delete pending state from the end of the build phase until the end of the delete phase.
Build, during which indexes are created.
During the build phase, indexes are created based on the index keys collected during the load phase. The index keys are sorted during the load phase, and index statistics are collected (if the STATISTICS YES with INDEXES option was specified). The statistics are similar to those collected through the RUNSTATS command (see the Command Reference). If a failure occurs during the build phase, the RESTART option automatically restarts the load operation at the appropriate point.
Unique key violations are placed into the exception table, if one was specified (see Exception Table), and messages about rejected rows are written to the message file. Following the completion of the load process, review these messages, resolve any problems, and insert corrected rows into the table.
Delete, during which the rows that caused a unique key violation or a DATALINK violation are removed from the table.
Do not attempt to delete or to modify any temporary files created by the load utility. Some temporary files are critical to the delete phase. If a failure occurs during the delete phase, the RESTART option automatically restarts the load operation at the appropriate point.
Note: Each deletion event is logged. If you have a large number of records that violate the uniqueness condition, the log could fill up during the delete phase.

The following information is required when loading data:

The path and the name of the input file, named pipe, or device.
The name or alias of the target table.
The format of the data in the input file. This format can be DEL, ASC, or PC/IXF. See Appendix C. Export/Import/Load Utility File Formats.
Whether the input data is to be appended to the table, or is to replace the existing data in the table.
A message file name, if the utility is invoked through the application programming interface (API), sqluload.

You can also specify:

The method to use for loading the data: column location, column name, or relative column position.
How often the utility is to establish consistency points. Use the SAVECOUNT parameter to specify this value. If this parameter is specified, a load restart operation will start at the last consistency point, instead of at the beginning.
The names of the table columns into which the data is to be inserted.
The paths and the names of the input files in which LOBs are stored. The lobsinfile modifier tells the load utility that all LOB data is being loaded from files (see File Type Modifiers (Load)).
Whether column values being loaded have implied decimal points. The implieddecimal modifier tells the load utility that decimal points are to be applied to the data as it enters the table (see File Type Modifiers (Load)). For example, the value 12345 is loaded into a DECIMAL(8,2) column as 123.45, not 12345.00.
Whether the utility should modify the amount of free space available after a table is loaded. Additional free space permits INSERT and UPDATE growth to the table following the completion of a load operation. Reduced free space keeps related rows more closely together and can enhance table performance.
- The totalfreespace modifier enables you to append empty data pages to the end of the loaded table. The number specified with this parameter is the percentage of the total pages in the table that is to be appended to the end of the table as free space. For example, if you specified the number twenty with this parameter, and the table has 100 data pages, twenty additional empty pages are appended. The total number of data pages in the table will then be 120.
- The pagefreespace modifier enables you to control the amount of free space that will be allowed on each loaded data page. The number specified with this parameter is the percentage of each data page that is to be left as free space. The first row in a page is added without restriction. Therefore, with very large rows and a large number specified with this parameter, there may be less free space left on each page than that indicated by the value specified with this parameter.
- The indexfreespace modifier enables you to control the amount of free space that will be allowed on each loaded index page. The number specified with this parameter is the percentage of each index page that is to be left as free space. The first index entry in a page is added without restriction. Additional index entries are placed in the index page, provided the percent free space threshold can be maintained. The default value is the one used at CREATE INDEX time. The indexfreespace value takes precedence over the PCTFREE value specified in the CREATE INDEX statement.
If you specify the pagefreespace modifier, and you have an index on the table, you might consider specifying indexfreespace. When deciding on the amount of free space to leave for each, consider that the size of each row being inserted into the table will likely be larger than the size of the associated key to be inserted into the index. In addition, the page size of the table spaces for the table and the index may be different.
Whether statistics are to be gathered during the load process. This option is only supported if the load operation is running in REPLACE mode.
If data is appended to a table, statistics are not collected. To collect current statistics on an appended table, invoke the runstats utility following completion of the load process. If gathering statistics on a table with a unique index, and duplicate keys are deleted during the delete phase, statistics are not updated to account for the deleted records. If you expect to have a significant number of duplicate records, do not collect statistics during the load operation. Instead, invoke the runstats utility following completion of the load process.
Whether to keep a copy of the changes made. This is done to enable rollforward recovery of the database. This option is not supported if forward log recovery is disabled for the database; that is, if the database configuration parameters logretain and userexit are disabled. If no copy is made, and forward log recovery is enabled, the table space is left in backup pending state at the completion of the load operation (see Pending States After a Load Operation).
Logging is required for fully recoverable databases. The load utility almost completely eliminates the logging associated with the loading of data. In place of logging, you have the option of making a copy of the loaded portion of the table. For information about how DB2 keeps tracks of the load copies, see Using the Load Copy Location File. If you have a database environment that allows for database recovery following a failure, you can do one of the following:
- Explicitly request that a copy of the loaded portion of the table be made.
- Take a backup of the table spaces in which the table resides immediately after the completion of the load operation.
If you are loading a table that already contains data, and the database is non-recoverable, ensure that you have a backed-up copy of the database, or the table spaces for the table being loaded, before invoking the load utility, so that you can recover from errors.
If you want to perform a sequence of multiple load operations on a recoverable database, the sequence of operations will be faster if you specify each load operation to be non-recoverable, and take a backup at the end of the load sequence, than if you invoke each of the load operations with the COPY YES option. You can use the NONRECOVERABLE option to specify that a load transaction is to be marked as non-recoverable, and that it will not be possible to recover it by a subsequent roll forward action. The rollforward utility will skip the transaction, and will mark the table into which data was being loaded as "invalid". The utility will also ignore any subsequent transactions against that table. After the roll forward is completed, such a table can only be dropped (see Figure 2). With this option, table spaces are not put in backup pending state following the load operation, and a copy of the loaded data does not have to be made during the load operation.
Figure 2. Non-recoverable Processing During a Roll Forward Action

For more information, see the database recovery chapter in the Administration Guide.
The fully qualified path to be used when creating temporary files during a load operation. The name is specified by the TEMPFILES PATH parameter of the LOAD command. The default value is the database path. The path resides on the server machine, and is accessed by the DB2 instance exclusively. Therefore, any path name qualification given to this parameter must reflect the directory structure of the server, not the client, and the DB2 instance owner must have read and write permission on the path. This is true even if you are the instance owner. If you are not the instance owner, you must specify a location that is writable by the instance owner. For more information about temporary files, see Load Temporary Files.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]