There are many options that you can specify in the AutoLoader configuration
file.
- RELEASE Level
- The release level of this configuration file. Do not delete or
modify this line in the configuration file.
- LOAD Command
- The most important part of the configuration file is the LOAD
command. The AutoLoader needs the LOAD command to direct the handling
of the data, even if the selected mode of operation does not suggest that any
loading is required. For example, the AutoLoader extracts useful
information from the LOAD command even when performing a SPLIT_ONLY mode
operation. Specifications on the LOAD command indicate where the data
is coming from, what type of data it is (delimited ASCII, for example), how
the data is to be loaded, and the target table name.
Be sure to specify a complete LOAD command that includes the schema name,
file name, file type, and table name. The AutoLoader utility also
requires that the LOAD command conform to the format of the "db2 -f"
file, except for the extra leading "db2" keyword. For detailed
information about this and other command line processor (CLP) options, see the
Command Reference. There is no need to use the special escape shell characters in
the LOAD command. Finally, if the last character on a line is a
backslash (\) character, the next line is a continuation of the current
line. In this case, the backslash and the end-of-line characters are
ignored.
The AutoLoader returns an error if the CLIENT keyword is specified in the
LOAD command.
For detailed information about all of the parameters available on the LOAD
command, see LOAD Command.
- DATABASE Parameter
- This parameter is used to identify the database into which the data is to
be loaded. If no name is specified, SAMPLE is used as the
default value.
- HOSTNAME Parameter
- This parameter specifies the name of the remote machine on which the data
file resides. This machine can be an MVS host or another
workstation. If not specified, and the FILE_TRANSFER_CMD
parameter is set, the host name nohost is passed to the
FILE_TRANSFER_CMD parameter in the <hostname> argument.
There is no default value associated with this parameter.
- FILE_TRANSFER_CMD Parameter
- The previous version of AutoLoader supported the concept of host file
transfer, whereby the AutoLoader utility could be configured to transfer data
files from a remote host. That option has been replaced by the
FILE_TRANSFER_CMD option. This parameter specifies the fully
qualified name of an executable file, batch file, or script that is used to
transfer data from a remote host. The path must be accessible to the
AutoLoader. The full path, including the execution file name, must not
exceed 254 characters.
Before invoking the specified file, the AutoLoader establishes named pipes
in anticipation of the data being sent from the host. The number of
named pipes to be created is equivalent to the number of files or devices
listed in the FROM clause on the LOAD command. This information from
the LOAD command is also used to specify the parameters that are to be passed
to the executable file, batch file, or script.
Based on this information, AutoLoader creates the following command:
<COMMAND> <logpath> <hostname> <basepipename>
<nummedia> <source media list>
where
- <COMMAND> is the fully qualified path to an executable file, batch
file, or script used to move data from the host.
and the remaining items are parameters that can be used by the
command:
- <logpath> is the AutoLoader log path. The COMMAND program can
use this path to write diagnostic or temporary data.
- <hostname> is the host name specified by the HOSTNAME
parameter.
- <basepipename> is the base name for named pipes that the AutoLoader
will create. The AutoLoader utility generates the base name and
guarantees it to be unique on the system. The base name is appended to
by the utility to create the necessary named pipes.
- <nummedia> is the number of files or devices providing data (listed
in the FROM clause on the LOAD command).
- <source media list> includes the names of each of the files or
devices providing data (listed in the FROM clause on the LOAD command).
The names are delimited by double quotation marks to avoid potential problems
caused by special characters that may be present in the names.
An AIX sample file called atldftp.drv can be found in the
sqllib/samples/autoloader directory. The sample shows how
FTP can be used to move data from a remote host.
- SPLIT_FILE_LOCATION Parameter
- This parameter is used in two ways:
- To provide the path name to the location of the split files if the utility
is in LOAD_ONLY mode.
- To provide the path name to the location in which the files that have been
partitioned (if the utility is in SPLIT_ONLY mode) are to be placed.
If a value for this parameter is not specified, and the utility is
operating in SPLIT_ONLY mode, the split files are placed in the current
working directory; if the utility is operating in LOAD_ONLY mode, it
looks for the split files in the current working directory.
If SPLIT_FILE_LOCATION points to an NFS directory that is mounted
across all partitions after a SPLIT_ONLY operation, a LOAD_ONLY
operation will be able to access the split data files without any user
intervention. However, if SPLIT_FILE_LOCATION does not point to
an NFS directory after a SPLIT_ONLY operation, users are required to
manually move the split data files from each partition to the directory in
which the LOAD_ONLY operation will be executed.
- OUTPUT_NODES Parameter
- The database partitions on which the load operation is to be performed are
identified by this parameter. The specified partition numbers must be a
subset of the database partitions on which the table is defined. The
default value is all; that is, all database partitions on which the table
is defined will have data loaded into them.
- SPLIT_NODES Parameter
- The database partitions participating in the splitting process are
specified through this parameter. These database partitions may be the
same or different from the database partitions being loaded. If a value
for this parameter is not specified, the AutoLoader determines how many
partitions are needed for splitting, and which partitions will be used to
achieve optimal performance. The following rules are used to determine
how many partitions are needed for splitting:
- If the ANYORDER modifier in the LOAD command is not specified, only one
splitter is used in the AutoLoader session, and
- If only one partition is specified through the OUTPUT_NODES parameter,
or the working partition of the AutoLoader is not an element of the value
specified for the OUTPUT_NODES parameter, the working partition of
AutoLoader is used as the splitting partition.
- Otherwise, the first partition other than the AutoLoader working partition
(found in OUTPUT_NODES) is used as the splitting partition.
- If the anyorder modifier in the LOAD command is
specified,
- The number of splitting partitions is determined by
(number of partitions in OUTPUT_NODES)/4 + 1
- This number of partitions is chosen from those specified for the
OUTPUT_NODES parameter, excluding the AutoLoader working partition.
- RUN_STAT_NODE Parameter
- In conjunction with the STATISTICS YES specification on the LOAD command,
you can specify the database partition on which you want to collect
statistics. If left blank or a value of -1 is specified, the
default value is the first database partition in the output partition
list.
- MODE Parameter
- This parameter specifies the mode in which the AutoLoader utility is to
run. Valid values are: SPLIT_AND_LOAD (the default),
SPLIT_ONLY, LOAD_ONLY, or ANALYZE.
- SPLIT_AND_LOAD
- In this mode, data is partitioned and then loaded on the correct database
partitions. Data is transferred through direct TCP/IP communication
using sockets. Multiple input files are allowed.
- SPLIT_ONLY
- In this mode, the data is only split. A set of split data files is
generated for the specified database partitions. You must have
sufficient storage for each of the split data files. The split function
writes the files in the location specified by the
SPLIT_FILE_LOCATION parameter, or in the current working
directory. The directory location must be write-accessible. Data
is partitioned into separate files that are named using the convention
filename.xxx, where xxx represents the number of the
partition to which the split file belongs. If there are multiple input
data files in the LOAD command, they will all be split. However, only
one split file is generated for each database partition. The name of
the split file is the same as the name of the first input data file.
- LOAD_ONLY
- In this mode, previously split data is loaded. The data is
contained in separate files that are named using the convention
filename.xxx, or
filename.00xxx, where xxx
represents the number of the partition to which the split file belongs.
AutoLoader expects to find these files in the
SPLIT_FILE_LOCATION or in the current working directory.
The directory location must be read-accessible. The split files are
loaded concurrently on their corresponding partitions. If there are
multiple input data files in the LOAD command (such as infile1,
infile2, and so on), AutoLoader loads infile1.xxx
if it exists. Otherwise, it loads infile1.00xxx if it
exists. If neither exists, AutoLoader returns an error. If both
exist, AutoLoader loads infile1.xxx. Once the first
infile1 of either file type (xxx or
00xxx) is loaded, checking begins for infile2,
and this process is repeated until all of the input files are loaded.
- ANALYZE
- In this mode, a customized optimal partitioning map for a nodegroup is
generated. It is recommended that a data file with a large number of
records be specified as input (multiple input files are allowed); if this
is done, the map will produce a more even distribution of data across each of
the database partitions in the nodegroup. The output is written to the
file specified by the MAP_FILE_OUTPUT parameter. The
REDISTRIBUTE NODEGROUP command (see the Command
Reference) must be invoked before the new partitioning map can take
effect. Subsequent AutoLoader invocations in SPLIT_AND_LOAD mode
will automatically use the new partitioning map. The
MAP_FILE_INPUT parameter can be used when partitioning the data
according to the new partitioning map without changing the default
partitioning map of the nodegroup.
- LOGFILE Parameter
- This parameter is used to provide the base name of the temporary and
permanent files used by the AutoLoader utility:
<logfile>.split.cfg ...
Configuration file for all splitters.
<logfile>.split.<3-digit-node-number>.log ...
Log file for each splitter.
<logfile>.pmap.<pid> ...
Internal temporary file, where <pid> is the process ID
of this AutoLoader job.
<logfile>.load.<3-digit-node-number> ...
Message file for each loading process if there is no
message file specified in the LOAD command.
Although you can specify a path for the LOGFILE parameter, you must verify
the existence and accessibility of that path. The default value is
./autoloader.log.
Note: | If there are multiple concurrent AutoLoader sessions, you must ensure that
the specified base name or the path name is unique.
|
- AUTHENTICATION and PASSWORD Parameters
- These parameters are necessary if a password is required for remote
invocation of the splitter program, or client/server database connections when
loading. The default value for AUTHENTICATION is NO (no
password checking), and any value specified for the PASSWORD parameter is
ignored.
The concept of a local database connection has been extended for MPP
environments to include connections from any node of a given MPP
instance. That means that even though the instance is configured using
AUTHENTICATION server, a password is not required if a connection is being
attempted from one of the nodes defined in the db2nodes.cfg
file. The AutoLoader makes use of this new connection behavior when the
AUTHENTICATION flag in the AutoLoader configuration file is not set, or is set
to NO, and a value for the PASSWORD parameter is not
specified. A password for AutoLoader is only mandatory if a password is
required for remote execution of programs on your system. For example,
a password is required if the .rhosts file on a UNIX system
has not been set up properly to enable rsh execution.
Alternatively, if a password is needed, the DB2 registry variable
DB2ATLD_PWFILE, which defines the fully qualified path to a
password file created by the user, can be set. Both the password file
and the fully qualified path must be accessible to the AutoLoader
utility. If this variable is defined, the first word in the file
pointed to by its value will be the password.
- MAX_NUM_SPLITTERS Parameter
- This parameter specifies the maximum number of splitter processes that can
be used in an AutoLoader job. The default value is 25.
- FORCE Parameter
- This parameter forces the AutoLoader job to continue even if the utility
determines (at startup time) that some target partitions or table spaces are
offline. If the value is NO, and some partitions are
unavailable, no data will be processed. If the value is YES,
database partitions that are available will be loaded, and all others will be
ignored. The default value for this parameter is NO.
- STATUS_INTERVAL Parameter
- This parameter specifies the number of megabytes (MB) of data to load
before generating a progress message. Valid values are whole numbers in
the range of 1 to 4000. The default value is 100.
- PORTS Parameter
- This parameter specifies the range of TCP ports used to create sockets for
internal AutoLoader communications. The default range is 6000 to
6063. If defined at the time of AutoLoader invocation, the value of the
DB2ATLD_PORTS DB2 registry variable replaces any value specified
for this parameter.
- CHECK_LEVEL Parameter
- This parameter specifies whether checking for record truncation during
input or output should be performed. Valid values are CHECK
and NOCHECK. The default value is NOCHECK.
- MAP_FILE_INPUT Parameter
- This parameter specifies the name of the input file that points to a file
containing the customized partitioning map. If the partitioning map is
a customized (not a default) map, this parameter must be specified. You
can get a customized partitioning map by invoking the AutoLoader in ANALYZE
mode to generate an optimal map. This map must be moved to each
database partition in your database before actual loading can proceed.
- MAP_FILE_OUTPUT Parameter
- This parameter specifies a name for the partitioning map when the
AutoLoader is invoked in ANALYZE mode. An optimal partitioning map
distributes data evenly across all database partitions. If a value for
this parameter is not specified, and the utility is running in ANALYZE mode,
an error is returned.
- TRACE Parameter
- This parameter specifies the number of records to trace when you need to
review a dump of all of the data conversion process and the output of hashing
values. The default value is zero (no tracing).
- NEWLINE Parameter
- This parameter specifies the character that is used to delimit each record
in the data file. This parameter is meaningful only if the input data
file is a fixed-length ASC file with each record delimited by a new line
character, and the reclen modifier in the LOAD command has been
specified. If a value of YES is specified, the AutoLoader
always checks whether the record is terminated by a new line character.
It also checks whether the record length matches that specified through the
reclen modifier. The default value is NO.