Visual Explain concepts



A

" " Access plan

Certain data is necessary to resolve an explainable SQL statement. An access plan specifies an order of operations for accessing this data. It lets you view statistics for selected tables, indexes, or columns; properties for operators; global information such as table space and function statistics; and configuration parameters relevant to optimization. With Visual Explain, you can view the access plan for an SQL statement in graphical form.

The optimizer produces an access plan whenever an explainable SQL statement is compiled. This happens at prep/bind time for static statements, and at run time for dynamic statements.

It is important to understand that an access plan is an estimate based on the information that is available. The optimizer bases its estimations on information such as the following:

Cost information associated with an access plan is the optimizer's best estimate of the resource usage for a query. The actual elapsed time for a query may vary depending on factors outside the scope of DB2 (for example, the number of other applications running at the same time). Actual elapsed time can be measured while running the query, by using performance monitoring.

" " Access plan graph node

The access plan graph consists of a tree displaying nodes. These nodes represent:



C

" " Clustering

Over time, updates may cause rows on data pages to change location lowering the degree of clustering that exists between an index and the data pages. Reorganizing a table with respect to a chosen index reclusters the data. A clustered index is most useful for columns that have range predicates because it allows better sequential access of data in the base table. This results in fewer page fetches, since like values are on the same data page.

In general, only one of the indexes in a table can have a high degree of clustering.

To check the degree of clustering for an index, double-click on its node to display the Index Statistics window. The cluster ratio or cluster factor values are shown in this window. If the value is low, consider reorganizing the table's data.

For more information, see the section on reorganizing table data in the Administration Guide.

" " Container

A container is a physical storage location of the data. It is associated with a table space, and can be a file or a directory or a device.

" " Cost

Cost, in the context of Visual Explain, is the estimated total resource usage necessary to execute the access plan for a statement (or the elements of a statement). Cost is derived from a combination of CPU cost (in number of instructions) and I/O (in numbers of seeks and page transfers).

The unit of cost is the timeron. A timeron does not directly equate to any actual elapsed time, but gives a rough relative estimate of the resources (cost) required by the database manager to execute two plans for the same query.

The cost shown in each operator node of an access plan graph is the cumulative cost, from the start of access plan execution up to and including the execution of that particular operator. It does not reflect factors such as the workload on the system or the cost of returning rows of data to the user.

" " Cursor blocking

Cursor blocking is a technique that reduces overhead by having the database manager retrieve a block of rows in a single operation. These rows are stored in a cache while they are processed. The cache is allocated when an application issues an OPEN CURSOR request, and is deallocated when the cursor is closed. When all the rows have been processed, another block of rows is retrieved.

Use the BLOCKING option on the PREP or BIND commands along with the following parameters to specify the type of cursor blocking:

UNAMBIG
Only unambiguous cursors are blocked (the default).

ALL
Both ambiguous and unambiguous cursors are blocked.

NO
Cursors are not blocked.

For more information, see the section on cursor blocking in the Administration Guide.



D

" " Database-managed space (DMS) table space

There are two types of table spaces that can exist in a database: Database-managed space (DMS), and system-managed space (SMS).

DMS table spaces are managed by the database manager. and are designed and tuned to meet its requirements.

The DMS table space definition includes a list of files (or devices) into which the database data is stored in its DMS table space format.

You can add pre-allocated files (or devices) to an existing DMS table space in order to increase its storage capacity. The database manager automatically rebalances existing data in all the containers belonging to that table space.

DMS and SMS table spaces can coexist in the same database.

" " Dynamic SQL

Dynamic SQL statements are SQL statements that are prepared and executed within an application program while the program is running. In dynamic SQL, either:

When DB2 runs a dynamic SQL statement, it creates an access plan that is based on current catalog statistics and configuration parameters. This access plan might change from one execution of the statements application program to the next.

The alternative to dynamic SQL is static SQL.



E

" " Explain snapshot

With Visual Explain, you can examine the contents of an explain snapshot.

An explain snapshot is compressed information that is collected when an SQL statement is explained. It is stored as a binary large object (BLOB) in the EXPLAIN_STATEMENT table, and contains the following information:

An explain snapshot is required if you want to display the graphical representation of an SQL statement's access plan. To ensure that an explain snapshot is created:

  1. Explain tables must exist in the database manager to store the explain snapshots. For information on how to create these tables, see Creating explain tables.

  2. For a package containing static SQL statements, set the EXPLSNAP option to ALL or YES when you bind or prep the package. You will get an explain snapshot for each explainable SQL statement in the package. For more information on the BIND and PREP commands, see the Command Reference.

  3. For dynamic SQL statements, set the EXPLSNAP option to ALL when you bind the application that issues them, or set the CURRENT EXPLAIN SNAPSHOT special register to YES or EXPLAIN before you issue them interactively. For more information, see the section on current explain snapshots in the SQL Reference.

" " Explainable statement

An explainable statement is an SQL statement for which an explain operation can be performed.

Explainable SQL statements are:

" " Explained statement

An explained statement is an SQL statement for which an explain operation has been performed. Explained statements are shown in the Explained Statements History window.



O

" " Operand

An operand is an entity on which an operation is performed. For example, a table or an index is an operand of various operators such as TBSCAN and IXSCAN.

" " Operator

An operator is either an action that must be performed on data, or the output from a table or an index, when the access plan for an SQL statement is executed.

The following operators can appear in the access plan graph:

DELETE
Deletes rows from a table.

FETCH
Fetches columns from a table using a specific record identifier.

FILTER
Filters data by applying one or more predicates to it.

GRPBY
Groups rows by common values of designated columns or functions, and evaluates set functions.

HSJOIN
Represents a hash join, where two or more tables are hashed on the join columns.

INSERT
Inserts rows into a table.

IXAND
ANDs together the row identifiers (RIDs) from two or more index scans.

IXSCAN
Scans an index of a table with optional start/stop conditions, producing an ordered stream of rows.

MSJOIN
Represents a merge join, where both outer and inner tables must be in join-predicate order.

NLJOIN
Represents a nested loop join that accesses an inner table once for each row of the outer table.

RETURN
Represents the return of data from the query to the user.

RIDSCN
Scans a list of row identifiers (RIDs) obtained from one or more indexes.

SORT
Sorts rows in the order of specified columns, and optionally eliminates duplicate entries.

TBSCAN
Retrieves rows by reading all required data directly from the data pages.

TEMP
Stores data in a temporary table to be read back out (possibly multiple times).

TQUEUE
Transfers table data between database agents.

UNION
Concatenates streams of rows from multiple tables.

UNIQUE
Eliminates rows with duplicate values, for specified columns.

UPDATE
Updates rows in a table.

" " CMPEXP

Operator name: CMPEXP

Represents: The computation of expressions required for intermediate or final results.

(This operator is for debug mode only.)

" " DELETE

Operator name: DELETE

Represents: The deletion of rows from a table.

This operator represents a necessary operation. To improve access plan costs, concentrate on other operators (such as scans and joins) that define the set of rows to be deleted.

Performance Suggestion:

" " FETCH

Operator name: FETCH

Represents: The fetching of columns from a table using a specific row identifier (RID).

Performance suggestions:

" " FILTER

Operator name: FILTER

Represents: The application of residual predicates so that data is filtered based on the criteria supplied by the predicates.

Performance suggestions:

" " GENROW

Operator name: GENROW

Represents: A built-in function that generates a table of rows, using no input from tables, indexes, or operators.

GENROW may be used by the optimizer to generate rows of data (for example, for an INSERT statement or for some IN-lists that are transformed into joins).

To view the estimated statistics for the tables generated by the GENROW function, double-click on its node.

" " GRPBY

Operator name: GRPBY

Represents: The grouping of rows according to common values of designated columns or functions. This operation is required to produce a group of values, or to evaluate set functions.

If no GROUP BY columns are specified, the GRPBY operator may still be used if there are aggregation functions in the SELECT list, indicating that the entire table is treated as a single group when doing that aggregation.

Performance suggestions:

" " HSJOIN

Operator name: HSJOIN

Represents: A hash join for which the qualified rows from tables are hashed to allow direct joining, without pre-ordering the content of the tables.

A join is necessary whenever there is more than one table referenced in a FROM clause. A hash join is possible whenever there is a join predicate that equates columns from two different tables. The join predicates need to be exactly the same data type. Hash joins may also arise from a rewritten subquery, as is the case with NLJOIN.

A hash join does not require the input tables be ordered. The join is performed by scanning the inner table of the hash join and generating a lookup table by hashing the join column values. It then reads the outer table, hashing the join column values, and checking in the lookup table generated for the inner table.

For more information, see the section on join concepts in the Administration Guide.

Performance suggestions:

" " INSERT

Operator name: INSERT

Represents: The insertion of rows into a table.

This operator represents a necessary operation. To improve access plan costs, concentrate on other operators (such as scans and joins) that define the set of rows to be inserted.

" " IXAND

Operator name: IXAND

Represents: The ANDing of the results of multiple index scans using Dynamic Bitmap techniques. The operator allows ANDed predicates to be applied to multiple indexes, in order to reduce underlying table accesses to a minimum.

This operator is performed to:

Performance suggestions:

" " IXSCAN

Operator name: IXSCAN

Represents: The scanning of an index to produce a reduced stream of rows. The scanning can use optional start/stop conditions, or may apply to indexable predicates that reference columns of the index.

This operation is performed to narrow down the set of qualifying rows before accessing the base table (based on predicates).

For more information, see the section on index scans in the Administration Guide.

Performance suggestions:

" " MSJOIN

Operator name: MSJOIN

Represents: A merge join for which the qualified rows from both outer and inner tables must be in join-predicate order. A merge join is also called a merge scan join or a sorted merge join.

A join is necessary whenever there is more than one table referenced in a FROM clause. A merge join is possible whenever there is a join predicate that equates columns from two different tables. It may also arise from a rewritten subquery.

A merge join requires ordered input on joining columns, since the tables are typically scanned only once. This ordered input is obtained by accessing an index or a sorted table.

For more information, see the section on join concepts in the Administration Guide.

Performance suggestions:

" " NLJOIN

Operator name: NLJOIN

Represents: A nested loop join that scans (usually with an index scan) the inner table once for each row of the outer table.

A join is necessary whenever there is more than one table referenced in a FROM clause. A nested loop join does not require a join predicate, but generally performs better with one.

A nested loop join is performed either:

For more information, see the section on join concepts in the Administration Guide.

Performance suggestions:

Related information:

" " PIPE

Operator name: PIPE

Represents: The transfer of rows to other operators without any change to the rows.

(This operator is for debug mode only.)

" " RETURN

Operator name: RETURN

Represents: The return of data from a query to the user. This is the final operator in the access plan graph and shows the total accumulated values and costs for the access plan.

This operator represents a necessary operation.

Performance Suggestion:

" " RIDSCN

Operator name: RIDSCN

Represents: The scan of a list of row identifiers (RIDs) obtained from one or more indexes.

This operator is considered by the optimizer when:

" " SORT

Operator name: SORT

Represents: The sorting of the rows in a table into the order of one or more of its columns, optionally eliminating duplicate entries.

Sorting is required when no index exists that satisfies the requested ordering, or when sorting would be less expensive than an index scan. Sorting is usually performed as a final operation once the required rows are fetched, or to sort data prior to a join or a group by.

If the number of rows is high or if the sorted data cannot be piped, the operation requires the costly generation of temporary tables.

For more information on sorts, see the Administration Guide.

Performance suggestions:

" " TBSCAN

Operator name: TBSCAN

Represents: A table scan (relation scan) that retrieves rows by reading all the required data directly from the data pages.

This type of scan is chosen by the optimizer over an index scan when:

For more information on table and index scans, see the Administration Guide.

Performance suggestions:

" " TEMP

Operator name: TEMP

Represents: The action of storing data in a temporary table, to be read back out by another operator (possibly multiple times). The table is removed after the SQL statement is processed, if not before.

This operator is required to evaluate subqueries or to store intermediate results. In some situations (such as when the statement can be updated), it may be mandatory.

" " TQUEUE

Operator name: TQUEUE

Represents: A table queue that is used to pass table data from one database agent to another when there are multiple database agents processing a query. Multiple database agents are used to process a query when parallelism is involved.

Table queue types are:

" " UNION

Operator name: UNION

Represents: The concatenation of streams of rows from multiple tables.

This operator represents a necessary operation. To improve access plan costs, concentrate on other operators (such as scans and joins) that define the set of rows to be concatenated.

" " UNIQUE

Operator name: UNIQUE

Represents: The elimination of rows having duplicate values for specified columns.

Performance Suggestion:

" " UPDATE

Operator name: UPDATE

Represents: The updating of data in the rows of a table.

This operator represents a necessary operation. To improve access plan costs, concentrate on other operators (such as scans and joins) that define the set of rows to be updated.

" " Optimizer

The optimizer is the component of the SQL compiler that chooses an access plan for a data manipulation language (DML) SQL statement. It does this by modeling the execution cost of many alternative access plans, and choosing the one with the minimal estimated cost.



P

" " Package

A package is an object stored in the database that includes the information needed to process the SQL statements associated with one source file of an application program. It is generated by either:

" " Predicate

A predicate is an element of a search condition that expresses or implies a comparison operation. Predicates are included in clauses beginning with WHERE or HAVING.

For example, in the following SQL statement:

SELECT * FROM SAMPLE
  WHERE NAME = 'SMITH' AND
  DEPT = 895 AND YEARS > 5

The following are predicates: NAME = 'SMITH'; DEPT = 895; and YEARS > 5.

Predicates fall into one of the following categories, ordered from most efficient to least efficient:

  1. Starting and stopping conditions bracket (narrow down) an index scan. (These conditions are also called range-delimiting predicates.)

  2. Index-page (also known as index sargable) predicates can be evaluated from an index because the columns involved in the predicate are part of the index key.

  3. Data-page (also known as data sargable) predicates cannot be evaluated from an index, but can be evaluated while rows remain in the buffer.

  4. Residual predicates typically require I/O beyond the simple accessing of a base table, and must be applied after data is copied out of the buffer page. They include predicates that contain subqueries, or those that read LONG VARCHAR or LOB data stored in files separate from the table.

When designing predicates, you should aim for the highest selectivity possible so that the fewest rows are returned.

The following types of predicates are the most effective and the most commonly used:

For more information, see the sections on data access concepts and optimization in the Administration Guide.



Q

" " Query optimization class

A query optimization class is a set of query rewrite rules and optimization techniques for compiling queries.

The primary query optimization classes are:

1
Restricted optimization. Useful when memory and processing resources are severely restrained. Roughly equivalent to the optimization provided by Version 1.

2
Slight optimization. Specifies a level of optimization higher than that of Version 1, but at significantly less optimization cost than levels 3 and above, especially for very complex queries.

3
Moderate optimization. Comes closest to matching the query optimization characteristics of DB2 for MVS/ESA.

5
Normal optimization. Recommended for a mixed environment using both simple transactions and complex queries.

7
Normal optimization. The same as query optimization 5 except that it does not reduce the amount of query optimization for complex dynamic SQL queries.

Other query optimization classes, to be used only under special circumstances, are:

0
Minimal optimization. Use only when little or no optimization is required (that is, for very simple queries on well-indexed tables).

9
Maximum optimization. Uses substantial memory and processing resources. Use only if class 5 is insufficient (that is, for very complex and long-running queries that do not perform well at class 5).

In general, use a higher optimization class for static queries and for queries that you anticipate will take a long time to execute, and a lower optimization class for simple queries that are submitted dynamically or that are run only a few times.

To set the query optimization for dynamic SQL statements, enter the following command in the Command Line Processor:

SET CURRENT QUERY OPTIMIZATION = n;

where 'n' is the desired query optimization class.

To set the query optimization for static SQL statements, use the QUERYOPT option on the BIND or PREP commands.

For more information, see the section on adjusting the optimization class in the Administration Guide.



S

" " Selectivity of predicates

Selectivity refers to the probability that any row will satisfy a predicate (that is, be true).

For example, a selectivity of 0.01 (1%) for a predicate operating on a table with 1,000,000 rows means that the predicate returns an estimated 10,000 rows (1% of 1,000,000), and discards an estimated 990,000 rows.

A highly selective predicate (one with a selectivity of 0.10 or less) is desirable. Such predicates return fewer rows for future operators to work on, thereby requiring less CPU and I/O to satisfy the query.

Example

Suppose that you have a table of 1,000,000 rows, and that the original query contains an 'ORDER BY' clause requiring an additional sorting step. With a predicate that has a selectivity of 0.01, the sort would have to be done on an estimated 10,000 rows. However, with a less selective predicate of 0.50, the sort would have to be done on an estimated 500,000 rows, thus requiring more CPU and I/O time.

" " Star joins

A set of joins are considered to be a star join when a fact table (large central table) is joined to two or more dimension tables (smaller tables containing descriptions of the column values in the fact table).

A Star join is comprised of 3 main parts:

It shows up as two or more joins feeding an IXAND operator.

A Semijoin is a special form of join in which the result of the join is only the Row Identifier (RID) of the inner table, instead of the joining of the inner and outer table columns.

Star joins use Semijoins to supply Row Identifiers to an Index ANDing operator. The Index ANDing operator accumulates the filtering affect of the various joins. The output from the Index ANDing operator is fed into an Index ORing operator, which orders the Row Identifiers, and eliminates any duplicate rows that may have resulted from the joins feeding the Index ANDing operator. The rows from the fact table are then fetched, using a Fetch operator. Finally, the reduced fact table is joined to all of the dimension tables, to complete the joins.

Performance suggestions:

" " Static SQL

A static SQL statement is embedded within an application program. All these embedded statements must be precompiled and bound into a package before the application can be executed.

When DB2 compiles these statements, it creates an access plan for each one that is based on the catalog statistics and configuration parameters at the time that the statements were precompiled and bound.

These access plans are always used when the application is run; they do not change until the package is bound again.

The alternative to static SQL is dynamic SQL.

" " System-managed space (SMS) table spaces

There are two types of table spaces that can exist in a database: system-managed space (SMS) and database-managed space (DMS).

An SMS table space is managed by the operating system, which stores the database data into a space that is assigned when a table space is created. The table space definition includes a list of one or more of the directory paths where this data is stored.

The file system manages the allocation and management of media storage.

SMS and DMS table spaces can coexist in the same database.

" " Table space

It is easier to manage very large databases if you partition them into separately managed parts called table spaces.

A table space lets you assign the location of data to particular logical devices or portions thereof. For example, when creating a table you can specify that its indexes or its long columns with long or large object (LOB) data be kept away from the rest of the table data.

A table space can be spread over one or more physical storage devices (containers) for increased performance. However, it is recommended that all the devices or containers within a table space have similar performance characteristics.

A table space can be managed in two different ways: as a system-managed space (SMS) or as a database-managed space (DMS).

" " Visual Explain
Note:As of Version 6, Visual Explain can no longer be invoked from the command line. It can still, however, be invoked from various database objects in the Control Center. For this version, the documentation continues to use the name Visual Explain.

Visual Explain lets you view the access plan for explained SQL statements as a graph. You can use the information available from the graph to tune your SQL queries for better performance.

An access plan graph shows details of:

You can also use Visual Explain to: