DB2 Server for VSE & VM: Application Programming

Grouping the Rows of a Table

The DB2 Server for VSE & VM SQL Reference manual shows how to apply the column functions (SUM, AVG, MIN, MAX, and COUNT) to a table. However, you can apply these functions only to particular columns in rows that satisfy a search condition. For example, the following statement finds the average number of employees for all occurrences of project number AD3111 in the PROJ_ACT table:

   SELECT AVG(ACSTAFF)
   FROM PROJ_ACT
   WHERE PROJNO = 'AD3111'

In contrast, the grouping feature of the database manager permits you to conceptually divide a table into groups of rows with matching values in one or more columns. You can then apply a function to each group. For example, to find the average number of employees for each project in the PROJ_ACT table:

   SELECT PROJNO,AVG(ACSTAFF)
   FROM PROJ_ACT
   GROUP BY PROJNO
   ORDER BY PROJNO

The query yields this result based on the sample table PROJ_ACT:

PROJNO AVG(ACSTAFF) ------ ----------------- AD3100 0.5000000000000000000000000 AD3110 1.0000000000000000000000000 AD3111 0.9357142857142857142857142 AD3112 0.6227272727272727272727272 AD3113 0.8461538461538461538461538 IF1000 0.6000000000000000000000000 IF2000 0.5500000000000000000000000 MA2100 0.7500000000000000000000000 MA2110 1.0000000000000000000000000 MA2111 1.0000000000000000000000000 MA2112 1.2142857142857142857142857 MA2113 1.0714285714285714285714285 OP1000 0.2500000000000000000000000 OP1010 2.5000000000000000000000000 OP2000 0.7500000000000000000000000 OP2010 1.0000000000000000000000000 OP2011 0.5000000000000000000000000 OP2012 0.5000000000000000000000000 OP2013 0.5000000000000000000000000 PL2100 1.0000000000000000000000000

One or more column functions can be applied to the groups. The following query finds the maximum, minimum, and average salary for each department, along with the count of the number of rows in each group (the column function COUNT(*) evaluates to the number of rows in the group):

   SELECT WORKDEPT, MAX(SALARY), MIN(SALARY), AVG(SALARY), COUNT(*)
   FROM EMPLOYEE
   GROUP BY WORKDEPT

Using VARCHAR and VARGRAPHIC within Groups

If you are grouping a VARCHAR or VARGRAPHIC column, trailing blanks are ignored. For example, if a select-statement was grouped by DESCRIPTION, "BOLT" and "BOLT " would match. They would be placed in the same group.

Using Nulls within Groups

If you are grouping columns that return null values, the null values are grouped in those columns. The null values may be returned because of undefined column values or arithmetic exception errors.

If you have defined a VIEW that contains a GROUP BY clause, the view columns named in the GROUP BY have the same nullability as the corresponding base table columns.

Using Select-Lists in Grouped Queries

When you use the GROUP BY clause in a query, the database manager returns only one result row for each group. The select-list of such a query can contain only:

GROUP BY columns
Column functions.

For example, this statement is incorrect:

You cannot include LASTNAME in the select-list because LASTNAME does not occur in the GROUP BY clause, and is not the operand of a column function. Aside from breaking language rules, the above statement is incorrect because a department may have many employees. It is as though you were asking the system to return multiple values to the same variable at the same time.

Using a WHERE Clause with a GROUP BY Clause

A grouping query can have a standard WHERE clause that eliminates non-qualifying rows before the groups are formed and the column functions are computed. Write the WHERE clause before the GROUP BY clause. For example:

   SELECT WORKDEPT, AVG(SALARY)
   FROM EMPLOYEE
   WHERE HIREDATE > '1970-01-01'
   GROUP BY WORKDEPT

Using the HAVING Clause

You can apply a qualifying condition to groups so that the system returns a result only for the groups that satisfy the condition, by including a HAVING clause after the GROUP BY clause. A HAVING clause can contain one or more group-qualifying predicates connected by ANDs and ORs. Each group-qualifying predicate compares a property of the group such as AVG(ACSTAFF) with one of the following:

Another property of the group (for example,HAVING AVG(ACSTAFF) > 2 * MIN(ACSTAFF))
A constant (for example, HAVING AVG(ACSTAFF) > 1.00)
A host variable (for example, HAVING AVG(ACSTAFF) > :LIMIT).

For example, the following query finds the average mean number of employees for projects having more than three activities:

   SELECT PROJNO,AVG(ACSTAFF)
      FROM PROJ_ACT
      GROUP BY PROJNO
      HAVING COUNT(*) > 3
      ORDER BY PROJNO

You can specify DISTINCT as part of the argument of a column function in the HAVING clause, because DISTINCT eliminates duplicate values before a function is applied. Thus, COUNT(DISTINCT PROJNO) computes the number of different project numbers. You cannot use DISTINCT in both the select-list and HAVING clause; you can use it only once in a query.

It is possible (though unusual) for a query to have a HAVING clause but no GROUP BY clause. In this case, the system treats the entire table as one group. Because the table is treated as a single group, you can have at most one result row. If the HAVING condition is true for the table as a whole, the selected result (which must consist entirely of column functions) is returned; otherwise the "not found" code (SQLCODE = 100 and SQLSTATE='02000') is returned.

Combining Joins

This section discusses the WHERE, GROUP BY, HAVING, and ORDER BY clauses of the select-statement.

You can use the various query techniques together in any combination. A query can join two or more tables and can also have a WHERE clause, a GROUP BY clause, a HAVING clause, and, if defined in a cursor, an ORDER BY clause. The sequence of application for these clauses is listed below:

Conceptually, all possible combinations of rows from the listed tables are formed.
The WHERE clause, which may contain join conditions, is applied to filter the rows of the conceptual table.
The GROUP BY clause is applied to form groups from the surviving rows.
The HAVING clause is applied to filter the groups. Only the surviving groups will return a result.
The select-list expressions are evaluated.
The ORDER BY clause determines the order in which the query result is returned.

Illustrating Grouping with an Exercise

By now you may be wondering when you need to use which feature. Consider this problem:

Write a query that returns:

The department number
The manager's employee number
The total number of activities for all the projects in the department
The sum of the estimated mean number of employees needed to staff the activities for all the projects in the department.

|Consider only projects that are estimated to end after January, |1 2000, and only include departments with more than two activities. |Finally, order the result by department name.

The first thing that you must do is to find in the example tables the names of the columns that contain the requested information, so that you can create a select-list:

"department number" is the DEPTNO column of the DEPARTMENT table.
"manager's employee number" is the MGRNO column of the DEPARTMENT table.
"activities" is the ACTNO column of the PROJ_ACT table, but the problem requests the total number of activities for all the projects in a department, so you must include the column function COUNT(*) in the select-list.
Note: You need the total number of activities for a particular department; this means that the query will have to group by department.
"estimated mean number of employees needed to staff the activities" implies the ACSTAFF column of the PROJ_ACT table. However, the problem requests The sum of the estimated mean number of employees needed to staff the activities for all the projects in the department. So you must include the column function SUM in the select-list; this means that the query will have to group by department.

Note: The columns DEPTNO and MGRNO (from the DEPARTMENT table) and ACSTAFF (from the PROJ_ACT table) come from different tables so you will need a join. However, the DEPARTMENT, and PROJ_ACT tables do not have a common column. To join them, you will have to use the PROJECT table in a three-table join. PROJECT contains both the DEPTNO column of the DEPARTMENT table and the PROJNO column of the PROJ_ACT table.

First, define the cursor(s) to be used in your program:

DECLARE C1 CURSOR FOR

Now write a SELECT clause:

   SELECT DEPARTMENT.DEPTNO, MGRNO, SUM(ACSTAFF), COUNT(*)

Note: Since a DEPTNO column appears in both the DEPARTMENT and the PROJECT tables, you must qualify which table it is from.

Write a FROM clause that lists the three tables used in the join:

   FROM DEPARTMENT, PROJECT, PROJ_ACT

You must include a WHERE clause because of the join condition; one line to join the DEPARTMENT table to the PROJECT table, and one to join the PROJECT table to the PROJ_ACT table:

   WHERE DEPARTMENT.DEPTNO = PROJECT.DEPTNO
   AND PROJECT.PROJNO = PROJ_ACT.PROJNO

|However, the problem states that only projects that are estimated to |end on or after January 1, 2000 should be considered. This condition |needs to be added to the WHERE clause:

|   AND PRENDATE >= '2000-01-01'

Note that PRENDATE is a column in the PROJ_ACT table and is unique among all the column names of the joined tables, so it does not have to be qualified. So far, the SQL statement is:

DECLARE C1 CURSOR FOR SELECT DEPARTMENT.DEPTNO, MGRNO, SUM(ACSTAFF), COUNT(*) FROM DEPARTMENT, PROJECT, PROJ_ACT WHERE DEPARTMENT.DEPTNO = PROJECT.DEPTNO AND PROJECT.PROJNO = PROJ_ACT.PROJNO AND PRENDATE >= '2000-01-01'

It is now necessary to group by DEPTNO to find the sum for each part, but MGRNO is also in the select-list, so it must be listed in the GROUP BY clause (recall the rules for grouping). Including MGRNO in the GROUP BY clause does not affect the formation of the groups, however, because MGRNO is a property of a given DEPTNO. The GROUP BY clause is:

   GROUP BY DEPARTMENT.DEPTNO, MGRNO

Note: You can group by PROJECT.DEPTNO if you choose, because of the equality expressed between DEPARTMENT.DEPTNO and PROJECT.DEPTNO in the join condition. If you use PROJECT.DEPTNO in the GROUP BY clause, however, you must also use it in the select-list.

If the table name is fully qualified in the FROM clause, it is good practice to fully qualify it in the whole statement.

The problem requires that the departments included in the query have at least two activities for all the projects in the department; a HAVING clause is needed to filter out the unwanted groups:

   HAVING COUNT(*) > 2

To have the system return the results in DEPTNO order, type:

Now you must position the cursor and identify the corresponding host variables used in your program:

   OPEN C1
   FETCH C1 INTO :DEPT, :MGRN, :TOTSTAFF, :NUMACT
   CLOSE C1

By incorporating the FETCH statement in a suitable host program loop along with an appropriate output command, this query produces the following result:

DEPTNO MGRNO SUM(ACSTAFF) COUNT(EXPRESSION 1) ------ ------ ----------------- ------------------- C01 000030 5.75 10 D01 ? 2.00 3 D21 000070 25.40 32 E21 000100 4.00 7

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]