A subquery that is allowed to refer to any of the previously mentioned tables is known as a correlated subquery. We also say that the subquery has a correlated reference to a table in the main query.
The following example uses an uncorrelated subquery to list the employee number and name of employees in department 'A00' with a salary greater than the average salary of the department:
SELECT EMPNO, LASTNAME FROM EMPLOYEE WHERE WORKDEPT = 'A00' AND SALARY > (SELECT AVG(SALARY) FROM EMPLOYEE WHERE WORKDEPT = 'A00')
This statement produces the following result:
EMPNO LASTNAME ------ --------------- 000010 HAAS 000110 LUCCHESSI
If you want to know the average salary for every department, the subquery needs to be evaluated once for every department. You can do this through the correlation capability of SQL, which permits you to write a subquery that is executed repeatedly, once for each row of the table identified in the outer-level query.
The following example uses a correlated subquery to list all the employees whose salary is higher than the average salary of their department:
SELECT E1.EMPNO, E1.LASTNAME, E1.WORKDEPT FROM EMPLOYEE E1 WHERE SALARY > (SELECT AVG(SALARY) FROM EMPLOYEE E2 WHERE E2.WORKDEPT = E1.WORKDEPT) ORDER BY E1.WORKDEPT
In this query, the subquery is evaluated once for every department. The result is:
EMPNO LASTNAME WORKDEPT ------ --------------- -------- 000010 HAAS A00 000110 LUCCHESSI A00 000030 KWAN C01 000060 STERN D11 000150 ADAMSON D11 000170 YOSHIMURA D11 000200 BROWN D11 000220 LUTZ D11 000070 PULASKI D21 000240 MARINO D21 000270 PEREZ D21 000090 HENDERSON E11 000280 SCHNEIDER E11 000100 SPENSER E21 000330 LEE E21 000340 GOUNOT E21
To write a query with a correlated subquery, use the same basic format of an ordinary outer query with a subquery. However, in the FROM clause of the outer query, just after the table name, place a correlation name. The subquery may then contain column references qualified by the correlation name. For example, if E1 is a correlation name, then E1.WORKDEPT means the WORKDEPT value of the current row of the table in the outer query. The subquery is (conceptually) reevaluated for each row of the table in the outer query.
By using a correlated subquery, you let the system do the work for you and reduce the amount of code you need to write within your application.
Unqualified correlated references are allowed in DB2. For example, the table EMPLOYEE has a column named LASTNAME, but the table SALES has a column named SALES_PERSON, and no column named LASTNAME.
SELECT LASTNAME, FIRSTNME, COMM FROM EMPLOYEE WHERE 3 > (SELECT AVG(SALES) FROM SALES WHERE LASTNAME = SALES_PERSON)
In this example, the system checks the innermost FROM clause for a LASTNAME column. Not finding one, it then checks the next innermost FROM clause (which in this case is the outer FROM clause). While not always necessary, qualifying correlated references is recommended to improve the readability of the query and to ensure that you are getting the result that you intend.
When would you want to use a correlated subquery? The use of a column function is sometimes a clue.
Let's say you want to list the employees whose level of education is higher than the average for their department.
First, you must determine the select-list items. The problem says "List the employees". This implies that LASTNAME from the EMPLOYEE table should be sufficient to uniquely identify employees. The problem also states the level of education (EDLEVEL) and the employees' departments (WORKDEPT) as conditions. While the problem does not explicitly ask for columns to be displayed, including them in the select-list will help illustrate the solution. A part of the query can now be constructed:
SELECT LASTNAME, WORKDEPT, EDLEVEL FROM EMPLOYEE
Next, a search condition (WHERE clause) is needed. The problem statement says, "...whose level of education is higher than the average for that employee's department". This means that for every employee in the table, the average education level for that employee's department must be computed. This statement fits the description of a correlated subquery. Some unknown property (the average level of education of the current employee's department) is being computed for each row. A correlation name is needed for the EMPLOYEE table:
SELECT LASTNAME, WORKDEPT, EDLEVEL FROM EMPLOYEE E1
The subquery needed is simple. It computes the average level of education for each department. The complete SQL statement is:
SELECT LASTNAME, WORKDEPT, EDLEVEL FROM EMPLOYEE E1 WHERE EDLEVEL > (SELECT AVG(EDLEVEL) FROM EMPLOYEE E2 WHERE E2.WORKDEPT = E1.WORKDEPT)
The result is:
LASTNAME WORKDEPT EDLEVEL --------------- -------- ------- HAAS A00 18 KWAN C01 20 PULASKI D21 16 HENDERSON E11 16 LUCCHESSI A00 19 PIANKA D11 17 SCOUTTEN D11 17 JONES D11 17 LUTZ D11 18 MARINO D21 17 JOHNSON D21 16 SCHNEIDER E11 17 MEHTA E21 16 GOUNOT E21 16
Suppose that instead of listing the employee's department number, you list the department name. The information you need (DEPTNAME) is in a separate table (DEPARTMENT). The outer-level query that defines a correlation variable can also be a join query (see Selecting Data from More Than One Table for details).
When you use joins in an outer-level query, list the tables to be joined in the FROM clause, and place the correlation name next to the appropriate table name.
To modify the query to list the department's name instead of its number, replace WORKDEPT by DEPTNAME in the select-list. The FROM clause must now also include the DEPARTMENT table, and the WHERE clause must express the appropriate join condition.
This is the modified query:
SELECT LASTNAME, DEPTNAME, EDLEVEL FROM EMPLOYEE E1, DEPARTMENT WHERE E1.WORKDEPT = DEPARTMENT.DEPTNO AND EDLEVEL > (SELECT AVG(EDLEVEL) FROM EMPLOYEE E2 WHERE E2.WORKDEPT = E1.WORKDEPT)
This statement produces the following result:
LASTNAME DEPTNAME EDLEVEL --------------- ----------------------------- ------- HAAS SPIFFY COMPUTER SERVICE DIV. 18 LUCCHESSI SPIFFY COMPUTER SERVICE DIV. 19 KWAN INFORMATION CENTER 20 PIANKA MANUFACTURING SYSTEMS 17 SCOUTTEN MANUFACTURING SYSTEMS 17 JONES MANUFACTURING SYSTEMS 17 LUTZ MANUFACTURING SYSTEMS 18 PULASKI ADMINISTRATION SYSTEMS 16 MARINO ADMINISTRATION SYSTEMS 17 JOHNSON ADMINISTRATION SYSTEMS 16 HENDERSON OPERATIONS 16 SCHNEIDER OPERATIONS 17 MEHTA SOFTWARE SUPPORT 16 GOUNOT SOFTWARE SUPPORT 16
The above examples show that the correlation name used in a subquery must be defined in the FROM clause of some query that contains the correlated subquery. However, this containment may involve several levels of nesting.
Suppose that some departments have only a few employees and therefore their average education level may be misleading. You might decide that in order for the average level of education to be a meaningful number to compare an employee against, there must be at least five employees in a department. So now we have to list the employees whose level of education is higher than the average for that employee's department, and only consider departments with at least five employees.
The problem implies another subquery because, for each employee in the outer-level query, the total number of employees in that person's department must be counted:
SELECT COUNT(*) FROM EMPLOYEE E3 WHERE E3.WORKDEPT = E1.WORKDEPT
Only if the count is greater than or equal to 5 is an average to be computed:
SELECT AVG(EDLEVEL) FROM EMPLOYEE E2 WHERE E2.WORKDEPT = E1.WORKDEPT AND 5 <= (SELECT COUNT(*) FROM EMPLOYEE E3 WHERE E3.WORKDEPT = E1.WORKDEPT)
Finally, only those employees whose level of education is greater than the average for that department are included:
SELECT LASTNAME, DEPTNAME, EDLEVEL FROM EMPLOYEE E1, DEPARTMENT WHERE E1.WORKDEPT = DEPARTMENT.DEPTNO AND EDLEVEL > (SELECT AVG(EDLEVEL) FROM EMPLOYEE E2 WHERE E2.WORKDEPT = E1.WORKDEPT AND 5 <= (SELECT COUNT(*) FROM EMPLOYEE E3 WHERE E3.WORKDEPT = E1.WORKDEPT))
This statement produces the following result:
LASTNAME DEPTNAME EDLEVEL --------------- ----------------------------- ------- PIANKA MANUFACTURING SYSTEMS 17 SCOUTTEN MANUFACTURING SYSTEMS 17 JONES MANUFACTURING SYSTEMS 17 LUTZ MANUFACTURING SYSTEMS 18 PULASKI ADMINISTRATION SYSTEMS 16 MARINO ADMINISTRATION SYSTEMS 17 JOHNSON ADMINISTRATION SYSTEMS 16 HENDERSON OPERATIONS 16 SCHNEIDER OPERATIONS 17