GROUP BY

GROUP BY identifies a selected column to use for grouping results. It divides the data into groups by the values in the column specified, and returns one row of results for each group. You can GROUP BY more than one column name (separate column names with commas). Always place GROUP BY after FROM and WHERE in a query, and before HAVING and ORDER BY.

All selected columns without an associated aggregation must appear in the GROUP BY clause.

GROUP BY accumulates the results by group, but does not necessarily order the groups; you need ORDER BY to do that. When you retrieve multiple rows from a table, the GROUP BY, HAVING, and ORDER BY clauses can be used to indicate:

The following query selects the average salary for each department.

This query:

SELECT DEPT, AVG(SALARY)
FROM Q.STAFF
GROUP BY DEPT

Produces this report:

    DEPT         AVG(SALARY)
  ------  ------------------
      10    20865.8625000000
      15    15482.3325000000
      20    16071.5250000000
      38    15457.1100000000
      42    14592.2625000000
      51    17218.1600000000
      66    17215.2400000000
      84    16536.7500000000

In the above example, GROUP BY divides the table into groups of rows with the same department number, and returns one row of results for each group. DEPT can be selected without a built-in function because it is used with GROUP BY, and because every member of each group has the same DEPT. As stated above, all column names included in a SELECT clause must either have an associated built-in function or must appear in the GROUP BY clause. For example, if DEPT is not used in the GROUP BY clause (in the example above), the list of average salaries has little meaning.

This is correct:

SELECT DEPT, AVG(SALARY), JOB
FROM Q.STAFF
GROUP BY DEPT, JOB

This is incorrect:

SELECT DEPT, AVG(SALARY), JOB
FROM Q.STAFF
GROUP BY DEPT

Generally, GROUP BY produces one row of a report for each different value of the grouping column. When there are several columns named in the GROUP BY clause, a different group of rows is produced each time a value in one of the columns changes. However, if there are null values in the column, each null value is treated as a separate group, consisting of one member.

Using GROUP BY in SQL is an alternative to using the usage code GROUP on the form (as described in GROUP Usage Code). GROUP BY provides an extension to the grouping that can be specified on the form and it allows conditional selection of data, which cannot be done on the form. For example, to see the smallest, largest, and average of total department salaries:

  1. Write and run this query:
    SELECT DEPT, SUM(SALARY), SUM(SALARY), SUM(SALARY)
    FROM Q.STAFF
    GROUP BY DEPT
  2. And use these usage codes on the form:
    NUM COLUMN HEADING                         USAGE
    --- -----------------------------------    -------
     1  DEPT
     2  SUM(SALARY)                            MINIMUM
     3  SUM(SALARY)1                           AVERAGE
     4  SUM(SALARY)2                           MAXIMUM

The report contains four columns, of which the last three are almost identical. All three show the total salary for each department; but the final row shows the minimum, average, and maximum of the totals.

Examples:

[ Previous Page | Next Page | Contents | Index ]