Data Movement Utilities Guide and Reference

AutoLoader Options

There are many options that you can specify in the AutoLoader configuration file.

RELEASE Level

The release level of this configuration file. Do not delete or modify this line in the configuration file.

LOAD Command

The most important part of the configuration file is the LOAD command. The AutoLoader needs the LOAD command to direct the handling of the data, even if the selected mode of operation does not suggest that any loading is required. For example, the AutoLoader extracts useful information from the LOAD command even when performing a SPLIT_ONLY mode operation. Specifications on the LOAD command indicate where the data is coming from, what type of data it is (delimited ASCII, for example), how the data is to be loaded, and the target table name.

Be sure to specify a complete LOAD command that includes the schema name, file name, file type, and table name. The AutoLoader utility also requires that the LOAD command conform to the format of the "db2 -f" file, except for the extra leading "db2" keyword. For detailed information about this and other command line processor (CLP) options, see the Command Reference. There is no need to use the special escape shell characters in the LOAD command. Finally, if the last character on a line is a backslash (\) character, the next line is a continuation of the current line. In this case, the backslash and the end-of-line characters are ignored.

The AutoLoader returns an error if the CLIENT keyword is specified in the LOAD command.

For detailed information about all of the parameters available on the LOAD command, see LOAD Command.

DATABASE Parameter

This parameter is used to identify the database into which the data is to be loaded. If no name is specified, SAMPLE is used as the default value.

HOSTNAME Parameter

This parameter specifies the name of the remote machine on which the data file resides. This machine can be an MVS host or another workstation. If not specified, and the FILE_TRANSFER_CMD parameter is set, the host name nohost is passed to the FILE_TRANSFER_CMD parameter in the <hostname> argument. There is no default value associated with this parameter.

FILE_TRANSFER_CMD Parameter

The previous version of AutoLoader supported the concept of host file transfer, whereby the AutoLoader utility could be configured to transfer data files from a remote host. That option has been replaced by the FILE_TRANSFER_CMD option. This parameter specifies the fully qualified name of an executable file, batch file, or script that is used to transfer data from a remote host. The path must be accessible to the AutoLoader. The full path, including the execution file name, must not exceed 254 characters.

Before invoking the specified file, the AutoLoader establishes named pipes in anticipation of the data being sent from the host. The number of named pipes to be created is equivalent to the number of files or devices listed in the FROM clause on the LOAD command. This information from the LOAD command is also used to specify the parameters that are to be passed to the executable file, batch file, or script.

Based on this information, AutoLoader creates the following command:

   <COMMAND> <logpath> <hostname> <basepipename>
      <nummedia> <source media list>

where

<COMMAND> is the fully qualified path to an executable file, batch file, or script used to move data from the host.

and the remaining items are parameters that can be used by the command:

<logpath> is the AutoLoader log path. The COMMAND program can use this path to write diagnostic or temporary data.
<hostname> is the host name specified by the HOSTNAME parameter.
<basepipename> is the base name for named pipes that the AutoLoader will create. The AutoLoader utility generates the base name and guarantees it to be unique on the system. The base name is appended to by the utility to create the necessary named pipes.
<nummedia> is the number of files or devices providing data (listed in the FROM clause on the LOAD command).
<source media list> includes the names of each of the files or devices providing data (listed in the FROM clause on the LOAD command). The names are delimited by double quotation marks to avoid potential problems caused by special characters that may be present in the names.

An AIX sample file called atldftp.drv can be found in the sqllib/samples/autoloader directory. The sample shows how FTP can be used to move data from a remote host.

SPLIT_FILE_LOCATION Parameter

This parameter is used in two ways:

To provide the path name to the location of the split files if the utility is in LOAD_ONLY mode.
To provide the path name to the location in which the files that have been partitioned (if the utility is in SPLIT_ONLY mode) are to be placed.

If a value for this parameter is not specified, and the utility is operating in SPLIT_ONLY mode, the split files are placed in the current working directory; if the utility is operating in LOAD_ONLY mode, it looks for the split files in the current working directory.

If SPLIT_FILE_LOCATION points to an NFS directory that is mounted across all partitions after a SPLIT_ONLY operation, a LOAD_ONLY operation will be able to access the split data files without any user intervention. However, if SPLIT_FILE_LOCATION does not point to an NFS directory after a SPLIT_ONLY operation, users are required to manually move the split data files from each partition to the directory in which the LOAD_ONLY operation will be executed.

OUTPUT_NODES Parameter

The database partitions on which the load operation is to be performed are identified by this parameter. The specified partition numbers must be a subset of the database partitions on which the table is defined. The default value is all; that is, all database partitions on which the table is defined will have data loaded into them.

SPLIT_NODES Parameter

The database partitions participating in the splitting process are specified through this parameter. These database partitions may be the same or different from the database partitions being loaded. If a value for this parameter is not specified, the AutoLoader determines how many partitions are needed for splitting, and which partitions will be used to achieve optimal performance. The following rules are used to determine how many partitions are needed for splitting:

If the ANYORDER modifier in the LOAD command is not specified, only one splitter is used in the AutoLoader session, and
- If only one partition is specified through the OUTPUT_NODES parameter, or the working partition of the AutoLoader is not an element of the value specified for the OUTPUT_NODES parameter, the working partition of AutoLoader is used as the splitting partition.
- Otherwise, the first partition other than the AutoLoader working partition (found in OUTPUT_NODES) is used as the splitting partition.
If the anyorder modifier in the LOAD command is specified,
1. The number of splitting partitions is determined by
```
   (number of partitions in OUTPUT_NODES)/4 + 1
```
2. This number of partitions is chosen from those specified for the OUTPUT_NODES parameter, excluding the AutoLoader working partition.

RUN_STAT_NODE Parameter

In conjunction with the STATISTICS YES specification on the LOAD command, you can specify the database partition on which you want to collect statistics. If left blank or a value of -1 is specified, the default value is the first database partition in the output partition list.

MODE Parameter

This parameter specifies the mode in which the AutoLoader utility is to run. Valid values are: SPLIT_AND_LOAD (the default), SPLIT_ONLY, LOAD_ONLY, or ANALYZE.

SPLIT_AND_LOAD: In this mode, data is partitioned and then loaded on the correct database partitions. Data is transferred through direct TCP/IP communication using sockets. Multiple input files are allowed.
SPLIT_ONLY: In this mode, the data is only split. A set of split data files is generated for the specified database partitions. You must have sufficient storage for each of the split data files. The split function writes the files in the location specified by the SPLIT_FILE_LOCATION parameter, or in the current working directory. The directory location must be write-accessible. Data is partitioned into separate files that are named using the convention filename.xxx, where xxx represents the number of the partition to which the split file belongs. If there are multiple input data files in the LOAD command, they will all be split. However, only one split file is generated for each database partition. The name of the split file is the same as the name of the first input data file.
LOAD_ONLY: In this mode, previously split data is loaded. The data is contained in separate files that are named using the convention filename.xxx, or filename.00xxx, where xxx represents the number of the partition to which the split file belongs. AutoLoader expects to find these files in the SPLIT_FILE_LOCATION or in the current working directory. The directory location must be read-accessible. The split files are loaded concurrently on their corresponding partitions. If there are multiple input data files in the LOAD command (such as infile1, infile2, and so on), AutoLoader loads infile1.xxx if it exists. Otherwise, it loads infile1.00xxx if it exists. If neither exists, AutoLoader returns an error. If both exist, AutoLoader loads infile1.xxx. Once the first infile1 of either file type (xxx or 00xxx) is loaded, checking begins for infile2, and this process is repeated until all of the input files are loaded.
ANALYZE: In this mode, a customized optimal partitioning map for a nodegroup is generated. It is recommended that a data file with a large number of records be specified as input (multiple input files are allowed); if this is done, the map will produce a more even distribution of data across each of the database partitions in the nodegroup. The output is written to the file specified by the MAP_FILE_OUTPUT parameter. The REDISTRIBUTE NODEGROUP command (see the Command Reference) must be invoked before the new partitioning map can take effect. Subsequent AutoLoader invocations in SPLIT_AND_LOAD mode will automatically use the new partitioning map. The MAP_FILE_INPUT parameter can be used when partitioning the data according to the new partitioning map without changing the default partitioning map of the nodegroup.

LOGFILE Parameter

This parameter is used to provide the base name of the temporary and permanent files used by the AutoLoader utility:

   <logfile>.split.cfg ...
      Configuration file for all splitters.
   <logfile>.split.<3-digit-node-number>.log ...
      Log file for each splitter.
   <logfile>.pmap.<pid> ...
      Internal temporary file, where <pid> is the process ID
         of this AutoLoader job.
   <logfile>.load.<3-digit-node-number> ...
      Message file for each loading process if there is no
         message file specified in the LOAD command.

Although you can specify a path for the LOGFILE parameter, you must verify the existence and accessibility of that path. The default value is ./autoloader.log.
Note: If there are multiple concurrent AutoLoader sessions, you must ensure that the specified base name or the path name is unique.

AUTHENTICATION and PASSWORD Parameters

These parameters are necessary if a password is required for remote invocation of the splitter program, or client/server database connections when loading. The default value for AUTHENTICATION is NO (no password checking), and any value specified for the PASSWORD parameter is ignored.

The concept of a local database connection has been extended for MPP environments to include connections from any node of a given MPP instance. That means that even though the instance is configured using AUTHENTICATION server, a password is not required if a connection is being attempted from one of the nodes defined in the db2nodes.cfg file. The AutoLoader makes use of this new connection behavior when the AUTHENTICATION flag in the AutoLoader configuration file is not set, or is set to NO, and a value for the PASSWORD parameter is not specified. A password for AutoLoader is only mandatory if a password is required for remote execution of programs on your system. For example, a password is required if the .rhosts file on a UNIX system has not been set up properly to enable rsh execution.

Alternatively, if a password is needed, the DB2 registry variable DB2ATLD_PWFILE, which defines the fully qualified path to a password file created by the user, can be set. Both the password file and the fully qualified path must be accessible to the AutoLoader utility. If this variable is defined, the first word in the file pointed to by its value will be the password.

MAX_NUM_SPLITTERS Parameter

This parameter specifies the maximum number of splitter processes that can be used in an AutoLoader job. The default value is 25.

FORCE Parameter

This parameter forces the AutoLoader job to continue even if the utility determines (at startup time) that some target partitions or table spaces are offline. If the value is NO, and some partitions are unavailable, no data will be processed. If the value is YES, database partitions that are available will be loaded, and all others will be ignored. The default value for this parameter is NO.

STATUS_INTERVAL Parameter

This parameter specifies the number of megabytes (MB) of data to load before generating a progress message. Valid values are whole numbers in the range of 1 to 4000. The default value is 100.

PORTS Parameter

This parameter specifies the range of TCP ports used to create sockets for internal AutoLoader communications. The default range is 6000 to 6063. If defined at the time of AutoLoader invocation, the value of the DB2ATLD_PORTS DB2 registry variable replaces any value specified for this parameter.

CHECK_LEVEL Parameter

This parameter specifies whether checking for record truncation during input or output should be performed. Valid values are CHECK and NOCHECK. The default value is NOCHECK.

MAP_FILE_INPUT Parameter

This parameter specifies the name of the input file that points to a file containing the customized partitioning map. If the partitioning map is a customized (not a default) map, this parameter must be specified. You can get a customized partitioning map by invoking the AutoLoader in ANALYZE mode to generate an optimal map. This map must be moved to each database partition in your database before actual loading can proceed.

MAP_FILE_OUTPUT Parameter

This parameter specifies a name for the partitioning map when the AutoLoader is invoked in ANALYZE mode. An optimal partitioning map distributes data evenly across all database partitions. If a value for this parameter is not specified, and the utility is running in ANALYZE mode, an error is returned.

TRACE Parameter

This parameter specifies the number of records to trace when you need to review a dump of all of the data conversion process and the output of hashing values. The default value is zero (no tracing).

NEWLINE Parameter

This parameter specifies the character that is used to delimit each record in the data file. This parameter is meaningful only if the input data file is a fixed-length ASC file with each record delimited by a new line character, and the reclen modifier in the LOAD command has been specified. If a value of YES is specified, the AutoLoader always checks whether the record is terminated by a new line character. It also checks whether the record length matches that specified through the reclen modifier. The default value is NO.

[ Top of Page | Previous Page | Next Page ]