A query specifies a result table.
In a program, a query is a component of certain SQL statements. There are three forms of a query:
There is another form of select, described under SELECT INTO.
For any form of a query, the privileges held by the authorization ID of the statement must include at least one of the following for each of the tables or views identified in the statement:
>>-select_clause-from_clause-+--------------+--+-----------------+--+---------------+--> '-where_clause-' '-group_by_clause-' '-having_clause-' >-------------------------------------------------------------->< |
The subselect is a component of the fullselect, the CREATE VIEW statement, and the INSERT statement. It is also a component of certain predicates which, in turn, are components of a subselect. A subselect that is a component of a predicate is called a subquery. If more than one table or view is identified in the FROM clause, the subselect is called a join. (See the DB2 Server for VSE & VM Application Programming manual for information on joining tables.)
A subselect specifies a result table derived from the tables or views identified in the FROM clause. The derivation can be described as a sequence of operations in which the result of each operation is input for the next. (This is only a way of describing the subselect. The method used to perform the derivation may be quite different from this description.)
The sequence of the (hypothetical) operations is:
.-ALL------. >>-SELECT----+----------+---+-*----------------------------+--->< '-DISTINCT-' | .-,-----------------------. | | V | | '----+-expression---------+--+-' +-table_name.*-------+ +-view_name.*--------+ '-correlation_name.*-' |
The SELECT clause specifies the columns of the final result table. The column values are produced by the application of the select list to R. The select list is the names or expressions specified in the SELECT clause. R is the result of the previous operation of the subselect. For example, if the only clauses specified are SELECT, FROM, and WHERE, R is the result of that WHERE clause.
Two rows are duplicates of one another only if each value in the first row is equal to the corresponding value in the second row. (For determining duplicate rows, two null values are considered equal.)
The list of names is established when the statement containing the SELECT clause is prepared. Hence, * does not identify any columns that have been added to a table after the statement has been prepared.
The list of names is established when the statement containing the SELECT clause is prepared. Hence, * does not identify any columns that have been added to a table after the statement has been prepared.
The number of columns in the result of SELECT is the same as the number of expressions in the operational form of the select list (that is, the list established during program preparation) and must not exceed 255. The result table of a subquery must be a single column, unless the subquery is used in the EXISTS predicate.
No column in the list may be a long string column if:
Note: |
The restriction does not apply to a subquery of an EXISTS predicate because an EXISTS subquery does not return values.
Some of the results of applying the select list to R depend on whether GROUP BY or HAVING is used. Those results are described separately.
In either case the nth column of the result contains the values specified by applying the nth expression in the operational form of the select list.
Result columns allow null values if they are derived from:
A result column derived from a column name acquires the unqualified name of that column. All other result columns have no names.
Each column of the result of SELECT acquires a data type from the
expression from which it is derived.
When the expression is ... | The data type of the result column is ... |
---|---|
the name of any numeric column | the same as the data type of the column, with the same precision and scale for decimal columns. |
an integer constant | INTEGER. |
a decimal or floating-point constant | the same as the data type of the constant, with the same precision and scale for decimal constants. For floating-point constants, the data type is DOUBLE PRECISION. |
the name of any numeric host variable | the same as the data type of the variable, with the same precision and scale for decimal variables. If the data type of the variable is not identical to an SQL data type (for example, DISPLAY SIGN LEADING SEPARATE in COBOL), the result column is decimal. |
an arithmetic or string expression | the same as the data type of the result. Decimal results have the same precision and scale as described under Expressions. |
any function | (see Chapter 4, "Functions" to determine the data type of the result.) |
the name of any string column | the same as the data type of the column, with the same length attribute. |
the name of any string host variable | the same as the data type of the variable, with a length attribute equal to the length of the variable. If the data type of the variable is not identical to an SQL data type (for example, a NUL-terminated string in C), the result column is a varying-length string. |
a character string constant of length n | VARCHAR(n) |
a graphic string constant of length n | VARGRAPHIC(n) |
the name of a datetime column | the same as the data type of the column. |
.-,----------------------------------------. V | >>-FROM--------+-table_name-+--+------------------+---+-------->< '-view_name--' '-correlation_name-' |
The FROM clause specifies an intermediate result table. If a single table or view is identified, the intermediate result table is simply that table or view. If more than one table or view is identified, the intermediate result table consists of all possible combinations of the rows of the identified tables or views. Each row of the result is a row from the first table or view concatenated with a row from the second table or view, concatenated in turn with a row from the third, and so on. The number of rows in the result is the product of the number of rows in all the named tables or views.
The following rules apply to the names specified in a FROM clause:
If a correlation_name is specified for a table or view, any qualified reference to a column of that table or view in the subselect must use that correlation_name. For more information about the FROM clause, see Correlation Names.
>>-WHERE--search_condition------------------------------------->< |
The WHERE clause specifies an intermediate result table that consists of those rows of R for which the search_condition is true. R is the result of the FROM clause of the subselect.
The search_condition must conform to the following rules:
Any subquery in the search_condition is effectively processed for each row of R and the results are used in the application of the search_condition to the given row of R. A subquery is actually processed for each row of R only if it includes a correlated reference. In fact, a subquery with no correlated references is processed just once, whereas a subquery with a correlated reference may have to be processed once for each row.
.-,--------------. V | >>-GROUP BY-------column_name---+------------------------------>< |
The GROUP BY clause specifies an intermediate result table that consists of a grouping of the rows of R. R is the result of the previous clause of the subselect.
Each column_name must unambiguously identify a column of R other than a long string column. Each identified column is called a grouping column.
The result of GROUP BY is a set of groups of rows. The rows within each group are in an arbitrary order. In each group of more than one row, all values of each grouping column are equal; and all rows with the same set of values of the grouping columns are in the same group. For grouping, all null values within a grouping column are considered equal.
Because every row of a group contains the same value of any grouping column, the name of a grouping column can be used in a search condition in a HAVING clause or an expression in a SELECT clause; in each case, the reference specifies only one value for each group.
If the grouping column contains varying-length strings with trailing blanks, the values in the group can differ in the number of trailing blanks and may not all have the same length. In that case, a reference to the grouping column still specifies only one value for each group, but the value for a group is chosen arbitrarily from the available set of values. Thus, the actual length of the result value is unpredictable.
If a field procedure is not involved, the collating sequence depends on the CCSID of the application server. With the DRDA protocol, the application server could be using an ASCII CCSID, producing an unexpected result to an application program assuming an EBCDIC CCSID and code page (see the IBM SQL Reference manual for details).
GROUP BY is ignored if used in a subquery of a basic predicate.
>>-HAVING--search_condition------------------------------------>< |
The HAVING clause specifies an intermediate result table that consists of those groups of R for which the search_condition is true. R is the result of the previous clause of the subselect. If this clause is not GROUP BY, R is considered a single group with no grouping columns.
Each column name in the search_condition must:
A group of R to which the search condition is applied supplies the argument for each column function in the search condition, except for any function whose argument is a correlated reference.
If the search condition contains a subquery, the subquery can be thought of as being proceesed each time the search condition is applied to a group of R, and the results used in applying the search condition. In actuality, the subquery is processed for each group only if it contains a correlated reference. For an illustration of the difference, see examples 6 and 7 under Examples of a subselect.
A correlated reference to a group of R must either identify a grouping column or be contained within a column function.
The HAVING clause must not be used in a subquery of a basic predicate.
Select all columns and rows from the EMPLOYEE table.
SELECT * FROM EMPLOYEE
Join the EMP_ACT and EMPLOYEE tables, select all the columns from the EMP_ACT table and add the employee's surname (LASTNAME) from the EMPLOYEE table to each row of the result.
SELECT EMP_ACT.*, LASTNAME FROM EMP_ACT, EMPLOYEE WHERE EMP_ACT.EMPNO = EMPLOYEE.EMPNO
Using a join-condition on the EMPLOYEE and DEPARTMENT tables, select the employee number (EMPNO), employee name (FIRSTNME concatenated with MIDINIT concatenated with LASTNAME), department number (WORKDEPT in the EMPLOYEE table and DEPTNO in the DEPARTMENT table) and department name (DEPTNAME) of all employees who were born (BIRTHDATE) earlier than 1930.
SELECT EMPNO, FIRSTNME CONCAT ' ' CONCAT MIDINIT CONCAT ' ' CONCAT LASTNAME, WORKDEPT, DEPTNAME FROM EMPLOYEE, DEPARTMENT WHERE WORKDEPT = DEPTNO AND YEAR(BIRTHDATE) < 1930
Select the job (JOB) and the minimum and maximum salaries (SALARY) for each group of rows with the same job code in the EMPLOYEE table, but only for groups with more than one row and with a maximum salary greater than or equal to 27 000.
SELECT JOB, MIN(SALARY), MAX(SALARY) FROM EMPLOYEE GROUP BY JOB HAVING COUNT(*) > 1 AND MAX(SALARY) >= 27000
Select all the rows of EMP_ACT table for employees (EMPNO) in department (WORKDEPT) 'E11'. (Employee department numbers are shown in the EMPLOYEE table.)
SELECT * FROM EMP_ACT WHERE EMPNO IN (SELECT EMPNO FROM EMPLOYEE WHERE WORKDEPT = 'E11')
From the EMPLOYEE table, select the department number (WORKDEPT) and maximum departmental salary (SALARY) for all departments whose maximum salary is less than the average salary for all employees.
SELECT WORKDEPT, MAX(SALARY) FROM EMPLOYEE GROUP BY WORKDEPT HAVING MAX(SALARY) < (SELECT AVG(SALARY) FROM EMPLOYEE)
The subquery in the HAVING clause would only be processed once in this example.
Using the EMPLOYEE table, select the department number (WORKDEPT) and maximum departmental salary (SALARY) for all departments whose maximum salary is less than the average salary in all other departments.
SELECT WORKDEPT, MAX(SALARY) FROM EMPLOYEE EMP_COR GROUP BY WORKDEPT HAVING MAX(SALARY) < (SELECT AVG(SALARY) FROM EMPLOYEE WHERE NOT WORKDEPT = EMP_COR.WORKDEPT)
Note: | In contrast to example 6, the subquery in the HAVING clause would need to be executed for each group. |
>>-+-subselect----+---------------------------------------------> '-(fullselect)-' .------------------------------------------. V | >--------+------------------------------------+--+------------->< '--+-UNION-----+---+-subselect----+--' '-UNION ALL-' '-(fullselect)-' |
A fullselect specifies a result table. If UNION is not used, the result of the fullselect is the result of the specified subselect.
Two rows are duplicates if each value in the first is equal to the corresponding value of the second. (For determining duplicates, two null values are considered equal.)
UNION and UNION ALL are associative operations. However, when UNION and UNION ALL are used in the same statement, the result depends on the order in which the operations are performed. Operations within parentheses are performed first. When the order is not specified by parentheses, operations are performed in left-to-right order.
If one operand column is... | And the other operand is... | The data type of the result column is... | ||
---|---|---|---|---|
CHAR(x) | CHAR(y) | CHAR(z) where z = max(x,y) | ||
VARCHAR(x) | CHAR(y) or VARCHAR(y) | VARCHAR(z) where z = max(x,y) | ||
bit data | mixed, SBCS, or bit data | bit data | ||
mixed data | mixed or SBCS data | mixed data | ||
SBCS data | SBCS data | SBCS data | ||
GRAPHIC(x) | GRAPHIC(y) | GRAPHIC(z) where z = max(x,y) | ||
VARGRAPHIC(x) | GRAPHIC(y) or VARGRAPHIC(y) | VARGRAPHIC(z) where z = max(x,y) | ||
DATE | DATE | DATE | ||
TIME | TIME | TIME | ||
TIMESTAMP | TIMESTAMP | TIMESTAMP | ||
FLOAT (double) | any numeric type | FLOAT (double) | ||
FLOAT (single) | FLOAT (single) | FLOAT (single) | ||
FLOAT (single) | DECIMAL, NUMERIC, INTEGER, or SMALLINT | FLOAT (double) | ||
DECIMAL(w,x) | DECIMAL(y,z) or NUMERIC(y,z,) | DECIMAL(p,s) where p = max(x,z)+max(w-x,y-z) s = max(x,z) (see note below) | ||
DECIMAL(w,x) | INTEGER | DECIMAL(p,x) where p = x+max(w-x,11) (see note below) | ||
DECIMAL(w,x) | SMALLINT | DECIMAL(p,x) where p = x+max(w-x,5) (see note below) | ||
INTEGER | INTEGER | INTEGER | ||
INTEGER | SMALLINT | INTEGER | ||
SMALLINT | SMALLINT | SMALLINT | ||
|
Select all columns and rows from the EMPLOYEE table.
SELECT * FROM EMPLOYEE
List the employee numbers (EMPNO) of all employees in the EMPLOYEE table whose department number (WORKDEPT) either begins with 'E' or who are assigned to projects in the EMP_ACT table whose project number (PROJNO) begins with either 'MA2100', 'MA2110', or 'MA2112'.
SELECT EMPNO FROM EMPLOYEE WHERE WORKDEPT LIKE 'E%' UNION SELECT EMPNO FROM EMP_ACT WHERE PROJNO IN('MA2100', 'MA2110', 'MA2112')
Make the same query as in example 2, and, in addition, "tag" the rows from the EMPLOYEE table with 'emp' and the rows from the EMP_ACT table with 'emp_act'.
SELECT EMPNO, 'emp' FROM EMPLOYEE WHERE WORKDEPT LIKE 'E%' UNION SELECT EMPNO, 'emp_act' FROM EMP_ACT WHERE PROJNO IN('MA2100', 'MA2110', 'MA2112')
Make the same query as in example 2, only use UNION ALL so that no duplicate rows are eliminated.
SELECT EMPNO FROM EMPLOYEE WHERE WORKDEPT LIKE 'E%' UNION ALL SELECT EMPNO FROM EMP_ACT WHERE PROJNO IN('MA2100', 'MA2110', 'MA2112')
The operations that combine strings are concatenation, UNION, and UNION ALL. These rules also apply to the VALUE scalar function. In each case, the CCSID of the result is determined at bind time, and the execution of the operation may involve conversion of strings to the coded character set identified by that CCSID.
The CCSID of the result is determined by the CCSIDs of the operands. The CCSIDs of the first two operands determine an intermediate result CCSID, this CCSID and the CCSID of the next operand determine a new intermediate result CCSID, and so on. The last intermediate result CCSID and the CCSID of the last operand determine the CCSID of the result string or column. For each pair of CCSIDs, the result CCSID is determined by the sequential application of the following rules:
Table 6. Selecting the CCSID of the Intermediate Result
First Operand | Second Operand | ||||
---|---|---|---|---|---|
Column Value | Derived Value | Constant | Special Register | Host Variable | |
Column Value | first | first | first | first | first |
Derived Value | second | first | first | first | first |
Constant | second | second | first | first | first |
Special Register | second | second | first | first | first |
Host Variable | second | second | second | second | first |
However, a host variable containing data in a foreign encoding scheme is always converted to the native form of data before it is used in any operation. The above rules are based on the assumption that this conversion has already occurred.
Note that an intermediate result is considered to be a derived value operand. For example, assume COLA, COLB, and COLC are columns with CCSIDs 37, 278, and 500, respectively. The result CCSID of COLA CONCAT COLB CONCAT COLC would be determined as follows:
An operand of concatenation or the selected argument of the VALUE scalar function is converted, if necessary, to the coded character set of the result string. Each string of an operand of UNION or UNION ALL is converted, if necessary, to the coded character set of the result column. Character conversion is necessary only if all of the following are true:
An error occurs if a character of a string cannot be converted or if the CCSID Conversion Selection Table is used but does not contain any information about the CCSID pair. A warning occurs if a character of a string is converted to the substitution character.
Examples
Given the following:
Expression | Type | CCSID |
---|---|---|
COL_1 | column | 00001 |
HV_2 | host variable | 00002 |
COL_3 | column | 00003 |
When evaluating the predicate:
COL_1 CONCAT :HV_2 CONCAT COL_3
The resulting CCSID of the first two operands is 00001. Because the result of the first concatenation is a derived string, the second concatenation will have a result CCSID of 00003 (the column CCSID is chosen over the CCSID of the derived string). The final CCSID is 00003.
Using the information from the previous example, when evaluating the predicate:
VALUE(COL_1, :HV_2, COL_3)
The resulting CCSID of the first two operands is 00001. However, the expression type is column, not derived string, since the two operands are not combined as in the concatenation example. Using the rules for the intermediate CCSID and the third operand's CCSID, 00001 (the intermediate CCSID) is chosen as the final CCSID. This is because the CCSID of the first column is chosen over the CCSID of the second column.
>>-fullselect----+------------------+---+-------------+-------->< +-order_by_clause--+ '-with_clause-' '-update_clause----' |
The select-statement is the form of a query that can be directly specified in a DECLARE CURSOR statement, or prepared and then referenced in a DECLARE CURSOR statement. It can also be issued interactively, causing a result table to be displayed at your terminal. In either case, the table specified by a select-statement is the result of the fullselect.
.-,-----------------------------. V .-ASC--. | >>-ORDER BY--------+-column_name-+---+------+--+--------------->< '-integer-----' '-DESC-' |
The ORDER BY clause specifies an ordering of the rows of the result table. If a single column is identified, the rows are ordered by the values of that column. If more than one column is identified, the rows are ordered by the values of the first identified column, then by the values of the second identified column, and so on. A long string column must not be identified.
A named column may be identified by an integer or a column_name. An unnamed column must be identified by an integer. A column is unnamed if it is derived from a constant, an arithmetic expression, or a function. If the fullselect includes a UNION operator, every column of the result table is unnamed.
Ordering is performed in accordance with the comparison rules described in Chapter 3, "Language Elements". The null value is higher than all other values. If your ordering specification does not determine a complete ordering, rows with duplicate values of the last identified column have an arbitrary order. If the ORDER BY clause is not specified, the rows of the result table have an arbitrary order.
If a field procedure is not involved, the collating sequence depends on the CCSID of the application server. With the DRDA protocol, the application server could be using an ASCII CCSID, producing an unexpected result to an application program assuming an EBCDIC CCSID and code page (see the IBM SQL Reference for details).
.-,--------------. V | >>-FOR UPDATE OF-----column_name---+--------------------------->< |
The UPDATE clause identifies the columns that can be updated in a subsequent Positioned UPDATE statement. Each column_name must be unqualified and must identify a column of the table or view identified in the first FROM clause of the fullselect. The clause must not be specified if the result table of the fullselect is read-only.
If an UPDATE clause is specified, only the columns identified in that clause can be updated in subsequent Positioned UPDATE statements.
If a dynamically prepared select-statement does not include an UPDATE clause, its associated cursor is not updateable.
The use of this clause for statically prepared select-statements and statically prepared Positioned UPDATE statements depends on whether the NOFOR preprocessor option is in effect. If the UPDATE clause is not specified:
See the DB2 Server for VSE & VM Application Programming manual for information on preprocessing and running programs.
If blocking is not in effect, a row may be deleted from a non-read-only table if the FOR UPDATE OF clause is specified.
If blocking is in effect and you intend to perform a positioned delete operation, blocking must be explicitly turned off by specifying the FOR UPDATE OF clause in the DECLARE statement.
Notes:
>>-WITH----+-RR-+---------------------------------------------->< +-CS-+ '-UR-' |
The WITH clause specifies the isolation level at which the statement is executed.
WITH UR can be specified only if the result table is read-only.
The isolation level specified on the SELECT statement will override any other isolation level specification; for example, in ISQL, if SET ISOLATION CS has been specified, and a SELECT statement WITH UR is executed, that statement will use an isolation level of uncommitted read. A SELECT statement without the WITH clause will use an isolation level of CS, as defined by the SET ISOLATION statement. As another example, a statement specifying WITH UR in a package prepped with ISOL(CS) will use an isolation level of uncommitted read.
If a SELECT statement specifying WITH UR is used with a cursor that is not read-only, SQLCODE -173 will be returned indicating that WITH UR cannot be specified on a select statement used in a non-read-only cursor.
Select all columns and rows from the EMPLOYEE table.
SELECT * FROM EMPLOYEE
Select the project name (PROJNAME), start date (PRSTDATE), and end date (PRENDATE) from the PROJECT table. Order the result table by the end date with the most recent dates appearing first.
SELECT PROJNAME, PRSTDATE, PRENDATE FROM PROJECT ORDER BY PRENDATE DESC
Select the department number (WORKDEPT) and average departmental salary (SALARY) for all departments in the EMPLOYEE table. Arrange the result table in ascending order by average departmental salary.
SELECT WORKDEPT, AVG(SALARY) FROM EMPLOYEE GROUP BY WORKDEPT ORDER BY 2
Declare a cursor named UP_CUR to be used in a PL/I program to update the start date (PRSTDATE) and the end date (PRENDATE) columns in the PROJECT table. The program must receive both of these values together with the project number (PROJNO) value for each row.
EXEC SQL DECLARE UP_CUR CURSOR FOR SELECT PROJNO, PRSTDATE, PRENDATE FROM PROJECT FOR UPDATE OF PRSTDATE, PRENDATE;