Administration Guide

Types of Parallelism

Parts of a database-related task (such as a database query) can be executed in parallel in order to speed up the task, often dramatically so. There are different ways a task is performed in parallel. The nature of the task, the database configuration, and the hardware environment determine how DB2 will perform a task in parallel. These considerations are interrelated. You should consider them together when first deciding on the physical and logical design of a database. This section describes the types of parallelism.

DB2 supports the following types of parallelism:

I/O
Query
Utility

I/O Parallelism

For situations in which multiple containers exist for a table space, the database manager can initiate parallel I/O. Parallel I/O refers to the process of reading from or writing to two or more I/O devices at the same time to reduce elapsed time. Performing I/O in parallel can result in significant improvements to I/O throughput.

I/O parallelism is a component of each hardware environment described in Hardware Environments. Table 1 lists the hardware environments best suited for I/O parallelism.

Query Parallelism

There are two types of query parallelism: inter-query parallelism and intra-query parallelism.

Inter-query parallelism refers to the ability of multiple applications to query a database at the same time. Each query will execute independently of the others, but DB2 will execute all of them at the same time. DB2 has always supported this type of parallelism.

Intra-query parallelism refers to the processing of parts of a single query at the same time using either intra-partition parallelism or inter-partition parallelism or both.

The term query parallelism is used throughout this book.

Intra-Partition Parallelism

Intra-partition parallelism refers to the ability to break up a query into multiple parts. (Some of the utilities also perform this type of parallelism. See Utility Parallelism.)

Intra-partition parallelism subdivides what is usually considered a single database operation such as index creation, database load, or SQL queries into multiple parts, many or all of which can be executed in parallel within a single database partition.

Figure 3. Intra-Partition Parallelism

Figure 3 shows a query that is broken into four pieces that can be executed in parallel, with the results returned more quickly than if the query was run in a serial fashion. The pieces are copies of each other. To utilize intra-partition parallelism, you need to configure the database appropriately. You can choose the degree of parallelism or let the system do it for you. The degree of parallelism is the number of pieces of a query that execute in parallel.

Table 1 lists the hardware environments best suited for intra-partition parallelism.

Inter-Partition Parallelism

Inter-partition parallelism refers to the ability to break up a query into multiple parts across multiple partitions of a partitioned database, on one machine or multiple machines. The query is performed in parallel. (Some of the utilities also perform this type of parallelism. See Utility Parallelism.)

Inter-partition parallelism subdivides what is usually considered a single database operation such as index creation, database load, or SQL queries into multiple parts, many or all of which can be executed in parallel across multiple partitions of a partitioned database in one machine or multiple machines.

Figure 4. Inter-Partition Parallelism

Figure 4 shows a query that is broken into four pieces that can be executed in parallel, with the results returned more quickly than if the query was run in a serial fashion in a single partition.

The degree of parallelism is largely determined by the number of partitions you create and how you define your nodegroups.

Table 1 lists the hardware environments best suited for inter-partition parallelism.

Using Both Intra-Partition and Inter-Partition Parallelism

You can use intra-partition parallelism and inter-partition parallelism at the same time. This combination provides, in effect, two dimensions of parallelism. This results in an even more dramatic increase in the speed at which queries are processed. Figure 5 illustrates this.

Figure 5. Both Inter-Partition and Intra-Partition Parallelism

Utility Parallelism

DB2's utilities can take advantage of intra-partition parallelism. They can also take advantage of inter-partition parallelism; where multiple database partitions exist, the utilities execute in each of the partitions in parallel. The following paragraphs describe how some utilities take advantage of parallelism.

The LOAD utility can take advantage of intra-partition parallelism and I/O parallelism. Loading data is a heavily CPU-intensive task. The LOAD utility takes advantage of multiple processors for tasks such as parsing and formatting data. Also, the LOAD utility can use parallel I/O servers to write the data to the containers in parallel. Refer to the Data Movement Utilities Guide and Reference or the LOAD command in the Command Reference for information on how to enable parallelism for the LOAD utility.

In a partitioned database environment, the AutoLoader utility takes advantage of intra-partition, inter-partition, and I/O parallelism by parallel invocations of load at each database partition where the table resides. Refer to Data Movement Utilities Guide and Reference for more information about the AutoLoader utility.

During index creation, the scanning and subsequent sorting of the data occurs in parallel. DB2 exploits both I/O parallelism and intra-partition parallelism when creating an index. This helps to speed up index creation when a CREATE INDEX command is issued, during restart (if an index is marked invalid), and during the reorganization of data.

Backing up and restoring data are heavily I/O bound tasks. DB2 exploits both I/O parallelism and intra-partition parallelism when performing backups and restores. Backup exploits I/O parallelism by reading from multiple table space containers in parallel, and asynchronously writing to multiple backup media in parallel. Refer to the BACKUP DATABASE command and the RESTORE DATABASE command in the Command Reference for information on how to enable parallelism for these two commands.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]