SQL Getting Started

Correlated Subqueries

A subquery that is allowed to refer to any of the previously mentioned tables is known as a correlated subquery. We also say that the subquery has a correlated reference to a table in the main query.

The following example is an uncorrelated subquery that lists the employee number and name of employees in department 'A00' with a salary greater than the average salary of the department:

 
     SELECT EMPNO, LASTNAME
        FROM EMPLOYEE
        WHERE WORKDEPT = 'A00'
          AND SALARY > (SELECT AVG(SALARY)
                           FROM EMPLOYEE
                           WHERE WORKDEPT = 'A00')

If you want to know the average salary for every department, the subquery needs to be evaluated once for every department. You can do this by using the correlation capability of SQL, which permits you to write a subquery that is executed repeatedly, once for each row of the table identified in the outer-level query. This type of correlated subquery is used to compute some property of each row of the outer-level table that is needed to evaluate a predicate in the subquery.

This example shows all the employees whose salary is higher than the average salary of their department:

 
     SELECT E1.EMPNO, E1.LASTNAME, E1.WORKDEPT
        FROM EMPLOYEE E1
        WHERE SALARY > (SELECT AVG(SALARY) 
                           FROM EMPLOYEE E2
                           WHERE E2.WORKDEPT = E1.WORKDEPT)
        ORDER BY E1.WORKDEPT

In this query, the subquery is evaluated once for every department. The result is:

     EMPNO  LASTNAME        WORKDEPT
     ------ --------------- --------
     000010 HAAS            A00     
     000110 LUCCHESSI       A00     
     000030 KWAN            C01     
     000060 STERN           D11     
     000220 LUTZ            D11     
     000200 BROWN           D11     
     000170 YOSHIMURA       D11     
     000150 ADAMSON         D11     
     000070 PULASKI         D21     
     000270 PEREZ           D21     
     000240 MARINO          D21     
     000090 HENDERSON       E11     
     000280 SCHNEIDER       E11     
     000100 SPENSER         E21     
     000340 GOUNOT          E21     
     000330 LEE             E21

To write a query with a correlated subquery, use the same basic format of an ordinary outer query with a subquery. However, in the FROM clause of the outer query, just after the table name, place a correlation name. The subquery may then contain column references qualified by the correlation name. For example, if E1 is a correlation name, then E1.WORKDEPT means the WORKDEPT value of the current row of the table in the outer query. The subquery is (conceptually) reevaluated for each row of the table in the outer query.

By using a correlated subquery, you let the system do the work for you and reduce the amount of code you need to write within your application.

Unqualified correlated references are allowed in DB2. For example, the table EMPLOYEE has a column named LASTNAME and table SALES has a column named SALES_PERSON, but no column named LASTNAME.

 
     SELECT LASTNAME, FIRSTNME, COMM 
        FROM EMPLOYEE 
        WHERE 3 > (SELECT AVG(SALES)
                      FROM SALES
                      WHERE LASTNAME = SALES_PERSON)

In this example, the system checks the innermost FROM clause for a LASTNAME column. Not finding one, it then checks the next innermost FROM clause (which in this case is the outer FROM clause). While not always necessary, qualifying correlated references is recommended to improve the readability of the query and to ensure that you are getting the result that you intend.

Implementing a Correlated Subquery

When would you want to use a correlated subquery? The use of a column function is sometimes a clue.

Let's say you want to list the employees whose level of education is higher than the average for their department.

First, you must determine the select-list items. The problem says "List the employees". This implies that the EMPNO from the EMPLOYEE table should be sufficient to uniquely identify employees. The problem also states the level of education (EDLEVEL) and the employees' departments (WORKDEPT) as conditions. While the problem does not explicitly ask for columns to be displayed, including them in the select-list will help illustrate the solution. A part of the query can now be constructed:

     SELECT LASTNAME, WORKDEPT, EDLEVEL
        FROM EMPLOYEE

Next, a search condition (WHERE clause) is needed. The problem statement says, "...whose level of education is higher than the average for that employee's department". This means that for every employee in the table, the average education level for that employee's department must be computed. This statement fits the description of a correlated subquery. Some property (average level of education of the current employee's department) is being computed for each row. A correlation name is needed for the EMPLOYEE table:

     SELECT LASTNAME, WORKDEPT, EDLEVEL
        FROM EMPLOYEE E1

The subquery needed is simple. It computes the average level of education for each department. The complete SQL statement is:

     SELECT LASTNAME, WORKDEPT, EDLEVEL
        FROM EMPLOYEE E1
        WHERE EDLEVEL > (SELECT AVG(EDLEVEL)
                            FROM EMPLOYEE  E2
                            WHERE E2.WORKDEPT = E1.WORKDEPT)

The result is:

     LASTNAME        WORKDEPT EDLEVEL
     --------------- -------- -------
     HAAS            A00           18
     KWAN            C01           20
     PULASKI         D21           16
     HENDERSON       E11           16
     LUCCHESSI       A00           19
     PIANKA          D11           17
     SCOUTTEN        D11           17
     JONES           D11           17
     LUTZ            D11           18
     MARINO          D21           17
     JOHNSON         D21           16
     SCHNEIDER       E11           17
     MEHTA           E21           16
     GOUNOT          E21           16

Suppose that instead of listing the employee's department number, you list the department name. The information you need (DEPTNAME) is in a separate table (DEPARTMENT). The outer-level query that defines a correlation variable can also be a join query (see Selecting Data from More Than One Table for details).

When you use joins in an outer-level query, list the tables to be joined in the FROM clause, and place the correlation name next to any of these table names.

To modify the query to list the department's name instead of its number, replace WORKDEPT by DEPTNAME in the select-list. The FROM clause must now also include the DEPARTMENT table, and the WHERE clause must express the appropriate join condition.

This is the modified query:

     SELECT LASTNAME, DEPTNAME, EDLEVEL
        FROM EMPLOYEE E1, DEPARTMENT
        WHERE E1.WORKDEPT = DEPARTMENT.DEPTNO
        AND EDLEVEL > (SELECT AVG(EDLEVEL)
                          FROM EMPLOYEE E2
                          WHERE E2.WORKDEPT = E1.WORKDEPT)

The above examples show that the correlation name used in a subquery must be defined in the FROM clause of some query that contains the correlated subquery. However, this containment may involve several levels of nesting.

Suppose that some departments have only a few employees and therefore their average education level may be misleading. You might decide that in order for the average level of education to be a meaningful number to compare an employee against, there must be at least five employees in a department. So now we have to list the employees whose level of education is higher than the average for that employee's department, and only consider departments with at least five employees.

The problem implies another subquery because, for each employee in the outer-level query, the total number of employees in that person's department must be counted:

     SELECT COUNT(*)
        FROM EMPLOYEE E3
        WHERE E3.WORKDEPT = E1.WORKDEPT

Only if the count is greater than or equal to 5 is an average to be computed:

     SELECT AVG(EDLEVEL)
        FROM EMPLOYEE E2
        WHERE E2.WORKDEPT = E1.WORKDEPT
        AND 5 <= (SELECT COUNT(*)
                     FROM EMPLOYEE  E3
                     WHERE E3.WORKDEPT = E1.WORKDEPT)

Finally, only those employees whose level of education is greater than the average for that department are included:

     SELECT LASTNAME, DEPTNAME, EDLEVEL
        FROM EMPLOYEE E1, DEPARTMENT
        WHERE E1.WORKDEPT = DEPARTMENT.DEPTNO
        AND EDLEVEL > 
        (SELECT AVG(EDLEVEL)
            FROM EMPLOYEE E2
            WHERE E2.WORKDEPT = E1.WORKDEPT
            AND 5 <=
            (SELECT COUNT(*)
                FROM EMPLOYEE E3
                WHERE E3.WORKDEPT = E1.WORKDEPT))

This statement produces the following result:

     LASTNAME        DEPTNAME                      EDLEVEL
     --------------- ----------------------------- -------
     PIANKA          MANUFACTURING SYSTEMS              17
     LUTZ            MANUFACTURING SYSTEMS              18
     JONES           MANUFACTURING SYSTEMS              17
     SCOUTTEN        MANUFACTURING SYSTEMS              17
     PULASKI         ADMINISTRATION SYSTEMS             16
     JOHNSON         ADMINISTRATION SYSTEMS             16
     MARINO          ADMINISTRATION SYSTEMS             17
     HENDERSON       OPERATIONS                         16
     SCHNEIDER       OPERATIONS                         17

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]