Exporting data in parallel reduces data transfer, and distributes the writing of the result set, as well as the generation of the formatted output, across nodes in a more effective manner than would otherwise be the case. When data is exported in parallel (by invoking multiple export operations, one for each partition of a table), it is extracted, converted on the local nodes, and then written to the local file system. In contrast, when exporting data serially (exporting through a single export operation), it is extracted in parallel and then shipped to the client, where a single process performs conversion and writes the result set to a local file system.
The db2batch command is used to monitor the performance characteristics and execution duration of SQL statements. This utility also has a parallel export function in partitioned database environments that:
There are two types of parallel export, depending on what tables are queried, and where those tables exist in the partitioned database environment:
There are two ways in which tables can be considered collocated:
In each case, the query can be run on each partition to generate that partition's export data file using the NODENUMBER function as described below. (Note that if a table exists in a single partition only, export parallelism is negated, because the data is retrieved from only one partition. To enable export parallelism in this case, refer to the next bullet.)
The export utility uses a staging table that is populated through the export query. This staging table is used to locate the rows of the "export" result set by running an INSERT of the fullselect query into the staging table. Once the staging table is created, the export utility generates an export data file at each partition by running:
"select * WHERE NODENUMBER(colname) = CURRENT NODE"
on the staging table.
A staging table can also be used to export a single partition table in parallel. In most cases, transferring the data from a single partition into a multi-partition staging table, and then exporting the staging table in parallel on all partitions, is faster than exporting the single partition table serially.
The export utility runs a query in parallel on each partition to retrieve the data on that partition. In the case of db2batch -p s, the original SELECT query is run in parallel. In the case of db2batch -p t and db2batch -p d, a staging table is loaded with the export data, using the specified query, and a SELECT * query is run on the staging table in parallel on each partition to export the data. To export only the data that resides on a given partition, db2batch adds the predicate NODENUMBER(colname) = CURRENT NODE to the WHERE clause of the query that is run on that partition. The colname parameter must be set to the qualified or the unqualified name of a table column. The export utility uses the first column name in the original query to set this parameter.
Following are the limitations on queries used by the export utility:
If, when specifying -p s on the db2batch command, you also use the -r option to create result output files, the files on each partition will be in sorted order if you have an ORDER BY clause. If a single sorted file is your objective, merge the sorted file on each partition into one sorted file. For example, on UNIX based systems, use the command sort -m to merge the files into a single sorted file. If you are sending your output to an NFS mounted file system, the output will not be sorted, even if you specify the ORDER BY clause.