To take advantage of the performance benefits that partitioned environments offer, you should consider using special programming techniques. For example, if your application accesses DB2 data from more than one database manager partition, you need to consider the information contained herein. For an overview of partitioned environments, refer to the Administration Guide and the SQL Reference.
If you declare a cursor from which you intend only to read, include FOR READ ONLY or FOR FETCH only in the OPEN CURSOR declaration. (FOR READ ONLY and FOR FETCH ONLY are equivalent statements.) FOR READ ONLY cursors allow the coordinator partition to retrieve multiple rows at a time, dramatically improving the performance of subsequent FETCH statements. When you do not explicitly declare cursors FOR READ ONLY, the coordinator partition treats them as updatable cursors. Updatable cursors incur considerable expense because they require the coordinator partition to retrieve only a single row per FETCH.
To optimize Online Transaction Processing (OLTP) applications, you may want to avoid simple SQL statements that require processing on all data partitions. You should design the application so that SQL statements can retrieve data from single partitions. These techniques avoid the expense the coordinator partition incurs communicating with one or all of the associated partitions.
A distributed subsection (DSS) is the action of sending subsections to the database partition that needs to do some work for a parallel query. It also describes the initiation of subsections with invocation specific values, such as values of variables in an OLTP environment. A directed DSS uses the table partition key to direct a query to a single partition. Use this type of query in your application to avoid the coordinator partition overhead required for a query broadcast to all nodes.
An example SELECT statement fragment that can take advantage of directed DSS follows:
SELECT ... FROM t1 WHERE PARTKEY=:hostvar
When the coordinator partition receives the query, it determines which partition holds the subset of data for :hostvar and directs the query specifically to that partition.
To optimize your application using directed DSS, divide complex queries into multiple simple queries. For example, in the following query the coordinator partition matches the partition key with multiple values. Because the data that satisfies the query lies on multiple partitions, the coordinator partition broadcasts the query to all partitions:
SELECT ... FROM t1 WHERE PARTKEY IN (:hostvar1, :hostvar2)
Instead, break the query into multiple SELECT statements (each with a single host variable) or use a single SELECT statement with a UNION to achieve the same result. The coordinator partition can take advantage of simpler SELECT statements to use directed DSS to communicate only to the necessary partitions. The optimized query looks like:
SELECT ... AS res1 FROM t1 WHERE PARTKEY=:hostvar1 UNION SELECT ... AS res2 FROM t1 WHERE PARTKEY=:hostvar2
Note that the above technique will only improve performance if the number of selects in the UNION is significantly smaller than the number of partitions.
A specialized form of the directed DSS query accesses data stored only on the coordinator partition. This is called a local bypass because the coordinator partition completes the query without having to communicate with another partition.
Local bypass is enabled automatically whenever possible, but you can increase its use by routing transactions to the partition containing the data for that transactions. One technique for doing this is to have a remote client maintain connections to each partition. A transaction can then use the correct connection based on the input partition key. Another technique is to group transaction by partition and have separate application server for each partition.
In order to determine the number of the partition on which transaction data resides, you can use the sqlugrpn API (Get Row Partitioning Number). This API allows an application to efficiently calculate the partition number of a row, given the partitioning key. For more information on the sqlugrpn API, refer to the Administrative API Reference.
Another alternative is to use the db2atld utility to divide input data by partition number and run a copy of the application against each partition. For more information on the db2atld utility, refer to the Command Reference.
A buffered insert is an insert statement that takes advantage of table queues to buffer the rows being inserted, thereby gaining a significant performance improvement. To use a buffered insert, an application must be prepared or bound with the INSERT BUF option.
Buffered inserts can result in substantial performance improvement in applications that perform inserts. Typically, you can use a buffered insert in applications where a single insert statement (and no other database modification statement) is used within a loop to insert many rows and where the source of the data is a VALUES clause in the INSERT statement. Typically the INSERT statement is referencing one or more host variables which change their values during successive executions of the loop. The VALUES clause can specify a single row or multiple rows.
Typical decision support applications require the loading and periodic insertion of new data. This data could be hundreds of thousands of rows. You can prepare and bind applications to use buffered inserts when loading tables.
To cause an application to use buffered inserts, use the PREP command to process the application program source file, or use the BIND command on the resulting bind file. In both situations, you must specify the INSERT BUF option. For more information about binding an application, see Binding. For more information about preparing an application, see Creating and Preparing the Source Files.
Note: | Buffered inserts cause the following steps to occur:
|
The standard interface in a partitioned environment, (without a buffered insert) loads one row at a time doing the following steps (assuming that the application is running locally on one of the partitions):
The insertion is not committed until the application issues a COMMIT.
For a given INSERT statement with the VALUES clause, the DB2 SQL compiler may not buffer the insert based on semantic, performance, or implementation considerations. If you prepare or bind your application with the INSERT BUF option, ensure that it is not dependent on a buffered insert. This means:
Buffered inserts result in the following performance advantages:
An application that is bound with INSERT BUF should be written so that the same INSERT statement with VALUES clause is iterated repeatedly before any statement or API that closes a buffered insert is issued.
Note: | You should do periodic commits to prevent the buffered inserts from filling the transaction log. |
Buffered inserts exhibit behaviors that can affect an application program. This behavior is caused by the asynchronous nature of the buffered inserts. Based on the values of the row's partitioning key, each inserted row is placed in a buffer destined for the correct partition. These buffers are sent to their destination partitions as they become full, or an event causes them to be flushed. You must be aware of the following, and account for them when designing and coding the application:
A buffered INSERT statement is either open or closed. The first invocation of the statement opens the buffered INSERT, the row is added to the appropriate buffer, and control is returned to the application. Subsequent invocations add rows to the buffer, leaving the statement open. While the statement is open, buffers may be sent to their destination partitions, where the rows are inserted into the target table's partition. If any statement or API that closes a buffered insert is invoked while a buffered INSERT statement is open (including invocation of a different buffered INSERT statement), or if a PREPARE statement is issued against an open buffered INSERT statement, the open statement is closed before the new request is processed. If the buffered INSERT statement is closed, the remaining buffers are flushed. The rows are then sent to the target partitions and inserted. Only after all the buffers are sent and all the rows are inserted does the new request begin processing.
If errors are detected during the closing of the INSERT statement, the SQLCA for the new request will be filled in describing the error, and the new request is not done. Also, the entire group of rows that were inserted through the buffered INSERT statement since it was opened are removed from the database. The state of the application will be as defined for the particular error detected. For example:
For example, consider the following application that is bound with the buffered insert option:
EXEC SQL UPDATE t1 SET COMMENT='about to start inserts'; DO UNTIL EOF OR SQLCODE < 0; READ VALUE OF hv1 FROM A FILE; EXEC SQL INSERT INTO t2 VALUES (:hv1); IF 1000 INSERTS DONE, THEN DO EXEC SQL INSERT INTO t3 VALUES ('another 1000 done'); RESET COUNTER; END; END; EXEC SQL COMMIT;
Suppose the file contains 8 000 values, but value 3 258 is not legal (for example, a unique key violation). Each 1 000 inserts results in the execution of another SQL statement, which then closes the INSERT INTO t2 statement. During the fourth group of 1 000 inserts, the error for value 3 258 will be detected. It may be detected after the insertion of more values (not necessarily the next one). In this situation, an error code is returned for the INSERT INTO t2 statement.
The error may also be detected when an insertion is attempted on table t3, which closes the INSERT INTO t2 statement. In this situation, the error code is returned for the INSERT INTO t3 statement, even though the error applies to table t2.
Suppose, instead, that you have 3 900 rows to insert. Before being told of the error on row number 3 258, the application may exit the loop and attempt to issue a COMMIT. The unique-key-violation return code will be issued for the COMMIT statement, and the COMMIT will not be performed. If the application wants to COMMIT the 3000 rows which are in the database thus far (the last execution of EXEC SQL INSERT INTO t3 ... ends the savepoint for those 3000 rows), then the COMMIT has to be REISSUED! Similar considerations apply to ROLLBACK as well.
Note: | When using buffered inserts, you should carefully monitor the SQLCODES returned to avoid having the table in an indeterminate state. For example, if you remove the SQLCODE < 0 clause from the THEN DO statement in the above example, the table could end up containing an indeterminate number of rows. |
The following restrictions apply:
The application can then be run from any supported client platform.
Although DB2 Universal Database provides excellent features for parallel query processing, the single point of connection of an application or an EXPORT command can become a bottleneck if you are extracting large volumes of data. This occurs because the passing of data from the database manager to the application is a CPU-intensive process that executes on a single node (typically a single processor as well).
DB2 Universal Database provides several methods to overcome the bottleneck, so that the volume of extracted data scales linearly per unit of time with an increasing number of processors. The following example describes the basic idea behind these methods.
Assume that you have a table called EMPLOYEE which is stored on 20 nodes, and you generate a mailing list (FIRSTNME, LASTNAME, JOB) of all employees who are in a legitimate department (that is, WORKDEPT is not NULL).
The following query is run on each node in parallel, and then generates the entire answer set at a single node (the coordinator node):
SELECT FIRSTNME, LASTNAME, JOB FROM EMPLOYEE WHERE WORKDEPT IS NOT NULL
But, the following query could be run on each partition in the database (that is, if there are five partitions, five separate queries are required, one at each partition). Each query generates the set of all the employee names whose record is on the particular partition where the query runs. Each local result set can be redirected to a file. The result sets then need to be merged into a single result set.
On AIX, you can use a property of Network File System (NFS) files to automate the merge. If all the partitions direct their answer sets to the same file on an NFS mount, the results are merged. Note that using NFS without blocking the answer into large buffers results in very poor performance.
SELECT FIRSTNME, LASTNAME, JOB FROM EMPLOYEE WHERE WORKDEPT IS NOT NULL AND NODENUMBER(NAME) = CURRENT NODE
The result can either be stored in a local file (meaning that the final result would be 20 files, each containing a portion of the complete answer set), or in a single NFS-mounted file.
The following example uses the second method, so that the result is in a single file that is NFS mounted across the 20 nodes. The NFS locking mechanism ensures serialization of writes into the result file from the different partitions. Note that this example, as presented, runs on the AIX platform with an NFS file system installed.
#define _POSIX_SOURCE #define INCL_32 #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <sqlenv.h> #include <errno.h> #include <sys/access.h> #include <sys/flock.h> #include <unistd.h> #define BUF_SIZE 1500000 /* Local buffer to store the fetched records */ #define MAX_RECORD_SIZE 80 /* >= size of one written record */ int main(int argc, char *argv[]) { EXEC SQL INCLUDE SQLCA; EXEC SQL BEGIN DECLARE SECTION; char dbname[10]; /* Database name (argument of the program) */ char userid[9]; char passwd[19]; char first_name[21]; char last_name[21]; char job_code[11]; EXEC SQL END DECLARE SECTION; struct flock unlock ; /* structures and variables for handling */ struct flock lock ; /* the NFS locking mechanism */ int lock_command ; int lock_rc ; int iFileHandle ; /* output file */ int iOpenOptions = 0 ; int iPermissions ; char * file_buf ; /* pointer to the buffer where the fetched records are accumulated */ char * write_ptr ; /* position where the next record is written */ int buffer_len = 0 ; /* length of used portion of the buffer */ /* Initialization */ lock.l_type = F_WRLCK; /* An exclusive write lock request */ lock.l_start = 0; /* To lock the entire file */ lock.l_whence = SEEK_SET; lock.l_len = 0; unlock.l_type = F_UNLCK; /* An release lock request */ unlock.l_start = 0; /* To unlock the entire file */ unlock.l_whence = SEEK_SET; unlock.l_len = 0; lock_command = F_SETLKW; /* Set the lock */ iOpenOptions = O_CREAT; /* Create the file if not exist */ iOpenOptions |= O_WRONLY; /* Open for writing only */ /* Connect to the database */ if (argc == 3) { strcpy( dbname, argv[2] ); /* get database name from the argument */ EXEC SQL CONNECT TO :dbname IN SHARE MODE ; if ( SQLCODE != 0 ) { printf( "Error: CONNECT TO the database failed. SQLCODE = %ld\n", SQLCODE ); exit(1); } } else if ( argc == 5 ) { strcpy( dbname, argv[2] ); /* get database name from the argument */ strcpy (userid, argv[3]); strcpy (passwd, argv[4]); EXEC SQL CONNECT TO :dbname IN SHARE MODE USER :userid USING :passwd; if ( SQLCODE != 0 ) { printf( "Error: CONNECT TO the database failed. SQLCODE = %ld\n", SQLCODE ); exit( 1 ); } } else { printf ("\nUSAGE: largevol txt_file database [userid passwd]\n\n"); exit( 1 ) ; } /* endif */ /* Open the input file with the specified access permissions */ if ( ( iFileHandle = open(argv[1], iOpenOptions, 0666 ) ) == -1 ) { printf( "Error: Could not open %s.\n", argv[2] ) ; exit( 2 ) ; } /* Set up error and end of table escapes */ EXEC SQL WHENEVER SQLERROR GO TO ext ; EXEC SQL WHENEVER NOT FOUND GO TO cls ; /* Declare and open the cursor */ EXEC SQL DECLARE c1 CURSOR FOR SELECT firstnme, lastname, job FROM employee WHERE workdept IS NOT NULL AND NODENUMBER(lastname) = CURRENT NODE; EXEC SQL OPEN c1 ; /* Set up the temporary buffer for storing the fetched result */ if ( ( file_buf = ( char * ) malloc( BUF_SIZE ) ) == NULL ) { printf( "Error: Allocation of buffer failed.\n" ) ; exit( 3 ) ; } memset( file_buf, 0, BUF_SIZE ) ; /* reset the buffer */ buffer_len = 0 ; /* reset the buffer length */ write_ptr = file_buf ; /* reset the write pointer */ /* For each fetched record perform the following */ /* - insert it into the buffer following the */ /* previously stored record */ /* - check if there is still enough space in the */ /* buffer for the next record and lock/write/ */ /* unlock the file and initialize the buffer */ /* if not */ do { EXEC SQL FETCH c1 INTO :first_name, :last_name, :job_code; buffer_len += sprintf( write_ptr, "%s %s %s\n", first_name, last_name, job_code ); buffer_len = strlen( file_buf ) ; /* Write the content of the buffer to the file if */ /* the buffer reaches the limit */ if ( buffer_len >= ( BUF_SIZE - MAX_RECORD_SIZE ) ) { /* get excl. write lock */ lock_rc = fcntl( iFileHandle, lock_command, &lock ); if ( lock_rc != 0 ) goto file_lock_err; /* position at the end of file */ lock_rc = lseek( iFileHandle, 0, SEEK_END ); if ( lock_rc < 0 ) goto file_seek_err; /* write the buffer */ lock_rc = write( iFileHandle, ( void * ) file_buf, buffer_len ); if ( lock_rc < 0 ) goto file_write_err; /* release the lock */ lock_rc = fcntl( iFileHandle, lock_command, &unlock ); if ( lock_rc != 0 ) goto file_unlock_err; file_buf[0] = '\0' ; /* reset the buffer */ buffer_len = 0 ; /* reset the buffer length */ write_ptr = file_buf ; /* reset the write pointer */ } else { write_ptr = file_buf + buffer_len ; /* next write position */ } } while (1) ; cls: /* Write the last piece of data out to the file */ if (buffer_len > 0) { lock_rc = fcntl(iFileHandle, lock_command, &lock); if (lock_rc != 0) goto file_lock_err; lock_rc = lseek(iFileHandle, 0, SEEK_END); if (lock_rc < 0) goto file_seek_err; lock_rc = write(iFileHandle, (void *)file_buf, buffer_len); if (lock_rc < 0) goto file_write_err; lock_rc = fcntl(iFileHandle, lock_command, &unlock); if (lock_rc != 0) goto file_unlock_err; } free(file_buf); close(iFileHandle); EXEC SQL CLOSE c1; exit (0); ext: if ( SQLCODE != 0 ) printf( "Error: SQLCODE = %ld.\n", SQLCODE ); EXEC SQL WHENEVER SQLERROR CONTINUE; EXEC SQL CONNECT RESET; if ( SQLCODE != 0 ) { printf( "CONNECT RESET Error: SQLCODE = %ld\n", SQLCODE ); exit(4); } exit (5); file_lock_err: printf("Error: file lock error = %ld.\n",lock_rc); /* unconditional unlock of the file */ fcntl(iFileHandle, lock_command, &unlock); exit(6); file_seek_err: printf("Error: file seek error = %ld.\n",lock_rc); /* unconditional unlock of the file */ fcntl(iFileHandle, lock_command, &unlock); exit(7); file_write_err: printf("Error: file write error = %ld.\n",lock_rc); /* unconditional unlock of the file */ fcntl(iFileHandle, lock_command, &unlock); exit(8); file_unlock_err: printf("Error: file unlock error = %ld.\n",lock_rc); /* unconditional unlock of the file */ fcntl(iFileHandle, lock_command, &unlock); exit(9); }
This method is applicable not only to a select from a single table, but also for more complex queries. If, however, the query requires noncollocated operations (that is, the Explain shows more than one subsection besides the Coordinator subsection), this can result in too many processes on some partitions if the query is run in parallel on all partitions. In this situation, you can store the query result in a temporary table TEMP on as many partitions as required, then do the final extract in parallel from TEMP.
If you want to extract all employees, but only for selected job classifications, you can define the TEMP table with the column names, FIRSTNME, LASTNAME, and JOB, as follows:
INSERT INTO TEMP SELECT FIRSTNME, LASTNAME, JOB FROM EMPLOYEE WHERE WORKDEPT IS NOT NULL AND EMPNO NOT IN (SELECT EMPNO FROM EMP_ACT WHERE EMPNO<200)
Then you would perform the parallel extract on TEMP.
When defining the TEMP table, consider the following:
If you require the final answer set (which is the merged partial answer set from all nodes) to be sorted, you can: