Administration Guide

Operation Merging

The SQL compiler will rewrite queries to merge query operations, in an attempt to construct the query so that it has the fewest number of operations, especially SELECT operations. The following examples are provided to illustrate some of the operations that can be merged by the SQL compiler:

"Example - View Merges"
Using views in a SELECT statement can restrict the join order of the table and can also introduce redundant joining of tables. By merging the views during query rewrite, these restrictions can be lifted.
"Example - Subquery to Join Transformations"
The use of subqueries in a SELECT statement can force a join method and the selection of inner and outer tables for the join. During query rewrite, the subquery can sometimes be merged into the main query as a join, which gives the optimizer more choices to choose the most efficient access plan.
"Example - Redundant Join Elimination"
During query rewrite redundant joins can be removed to further simplify the SELECT statement that will be optimized.
"Example - Shared Aggregation"
When using different functions, rewriting the query can reduce the number of calculations that need to be done.
"Example - Summary Tables"
If they exist, summary tables can be used instead of regular tables. They are of smaller size and therefore require less processing.

Example - View Merges

Suppose you have access to the following two views of the EMPLOYEE table, one showing employees with a high level of education and the other view showing employees earning more than $35,000:

   CREATE VIEW EMP_EDUCATION (EMPNO, FIRSTNME, LASTNAME, EDLEVEL) AS
   SELECT EMPNO, FIRSTNME, LASTNAME, EDLEVEL
     FROM EMPLOYEE
    WHERE EDLEVEL > 17
   CREATE VIEW EMP_SALARIES (EMPNO, FIRSTNAME, LASTNAME, SALARY) AS
   SELECT EMPNO, FIRSTNME, LASTNAME, SALARY
     FROM EMPLOYEE
    WHERE SALARY > 35000

Now suppose you perform the following query to list the employees who have a high education level and who are earning more than $35,000:

   SELECT E1.EMPNO, E1.FIRSTNME, E1.LASTNAME, E1.EDLEVEL, E2.SALARY
     FROM EMP_EDUCATION E1,
          EMP_SALARIES  E2
    WHERE E1.EMPNO = E2.EMPNO

During query rewrite, these two views could be merged to create the following query:

   SELECT E1.EMPNO, E1.FIRSTNME, E1.LASTNAME, E1.EDLEVEL, E2.SALARY
     FROM EMPLOYEE E1,
          EMPLOYEE E2
    WHERE E1.EMPNO = E2.EMPNO
      AND E1.EDLEVEL > 17
      AND E2.SALARY  > 35000

By merging the SELECT statements from the two views with the user-written SELECT statement, the optimizer can consider more choices when selecting an access plan. In addition, if the two views that have been merged use the same base table, additional rewriting may be performed as described in Example - Redundant Join Elimination.

Example - Subquery to Join Transformations

The SQL compiler will take a query containing a subquery, such as:

   SELECT EMPNO, FIRSTNME, LASTNAME, PHONENO
   FROM EMPLOYEE
   WHERE WORKDEPT IN
         (SELECT DEPTNO
            FROM DEPARTMENT
           WHERE DEPTNAME = 'OPERATIONS')

and convert it to a join query of the form:

   SELECT DISTINCT EMPNO, FIRSTNME, LASTNAME, PHONENO
     FROM EMPLOYEE EMP,
          DEPARTMENT DEPT
    WHERE EMP.WORKDEPT = DEPT.DEPTNO
      AND DEPT.DEPTNAME = 'OPERATIONS'

A join is generally much more efficient to execute than a subquery.

Example - Redundant Join Elimination

Queries can sometimes be written or generated which have unnecessary joins. Queries such as the following could also be produced during the query rewrite stage as described in Example - View Merges.

   SELECT E1.EMPNO, E1.FIRSTNME, E1.LASTNAME, E1.EDLEVEL, E2.SALARY
     FROM EMPLOYEE E1,
          EMPLOYEE E2
    WHERE E1.EMPNO = E2.EMPNO
      AND E1.EDLEVEL > 17
      AND E2.SALARY  > 35000

In this query, the SQL compiler can eliminate the join and simplify the query to:

   SELECT EMPNO, FIRSTNME, LASTNAME, EDLEVEL, SALARY
     FROM EMPLOYEE
    WHERE EDLEVEL > 17
      AND SALARY  > 35000

Another example assumes that a referential constraint exists between the EMPLOYEE and DEPARTMENT sample tables on the department number. First, a view is created.

   CREATE VIEW PEPLVIEW
      AS SELECT FIRSTNME, LASTNAME, SALARY, DEPTNO, DEPTNAME, MGRNO
            FROM EMPLOYEE E DEPARTMENT D
            WHERE E.WORKDEPT = D.DEPTNO

Then a query such as the following:

   SELECT LASTNAME, SALARY
      FROM PEPLVIEW

becomes

   SELECT LASTNAME, SALARY
      FROM EMPLOYEE
      WHERE WORKDEPT NOT NULL

Note that in this situation, even if the user knows that the query can be re-written, they may not be able to do so because they do not have access to the underlying tables. They may only have access to the view (shown above). Therefore, this type of optimization has to be performed within the database manager.

Redundancy in referential integrity joins is likely where:

Views are defined with joins
Queries are automatically generated.
For example, there are automated tools in query managers which prevent users from writing optimized queries.

Example - Shared Aggregation

Using multiple functions within a query can generate several calculations which take time. Reducing the number of calculations to be done within the query results in an improved plan. The SQL compiler takes a query using multiple functions such as:

   SELECT SUM(SALARY+BONUS+COMM) AS OSUM,
     AVG(SALARY+BONUS+COMM) AS OAVG,
     COUNT(*) AS OCOUNT
   FROM EMPLOYEE;

and transforms the query in the following way:

   SELECT OSUM,
     OSUM/OCOUNT
     OCOUNT
   FROM (SELECT SUM(SALARY+BONUS+COMM) AS OSUM,
     COUNT(*) AS OCOUNT
   FROM EMPLOYEE) AS SHARED_AGG;

This rewrite reduces the query from 2 sums and 2 counts to 1 sum and 1 count.

Example - Summary Tables

Following is an example of a multidimensional analysis that could take advantage of summary tables. A summary table is created with the sum and count of sales for each level of:

Product hierarchy
Location hierarchy
Time hierarchy, composed of year, month, day.

A wide range of queries can pick up their answers from this stored aggregate data. The following example calculates the sum of product group sales, by state, by month. Queries that can take advantage of such pre-computed sums would include:

Sales by month and product group
Total sales for years after 1990
Sales for 1995 or 1996
Sum of sales for a product group or product line
Sum of sales for a specific product group or product line AND for 1995, 1996
Sum of sales for a specific country.

While the precise answer is not included in the summary table for any of these queries, the cost of computing the answer using the summary table could be significantly less than using a large base table, because a portion of the answer is already computed. For example:

  CREATE TABLE PG_SALESSUM
    AS (
        SELECT l.id AS prodline, pg.id AS pgroup,
               loc.country, loc.state
               YEAR(pdate) AS year, MONTH(pdate) AS month,
               SUM(ti.amount) AS amount,
               COUNT(*) AS count
        FROM   cube.transitem AS ti, cube.trans AS t,
               cube.loc AS loc, cube.pgroup AS pg,
               cube.prodline AS l
        WHERE  ti.transid = t.id
           AND ti.pgid = pg.id
           AND pg.lineid = l.id
           AND t.locid = loc.id
           AND YEAR(pdate) > 1990
        GROUP BY l.id, pg.id, loc.country, loc.state,
                 year(pdate), month(pdate)
       )
  DATA INITIALLY DEFERRED REFRESH DEFERRED;
 
  REFRESH TABLE SALESCUBE;

The following are sample queries that would obtain significant performance improvements because they are able to use the results in the summary table that are already computed. The first example returns the total sales for 1995 and 1996:

  SET CURRENT REFRESH AGE=ANY
 
  SELECT YEAR(pdate) AS year, SUM(ti.amount) AS amount
  FROM   cube.transitem AS ti, cube.trans AS t,
         cube.loc AS loc, cube.pgroup AS pg,
         cube.prodline AS l
  WHERE  ti.transid = t.id
     AND ti.pgid = pg.id
     AND pg.lineid = l.id
     AND t.locid = loc.id
     AND YEAR(pdate) IN (1995, 1996)
  GROUP BY year(pdate);

The second example returns the total sales by product group for 1995 and 1996:

  SET CURRENT REFRESH AGE=ANY
 
  SELECT pg.id AS "PRODUCT GROUP",
         SUM(ti.amount) AS amount
  FROM   cube.transitem AS ti, cube.trans AS t,
         cube.loc AS loc, cube.pgroup AS pg,
         cube.prodline AS l
  WHERE  ti.transid = t.id
     AND ti.pgid = pg.id
     AND pg.lineid = l.id
     AND t.locid = loc.id
     AND YEAR(pdate) IN (1995, 1996)
  GROUP BY pg.id;

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]