The SQL compiler will rewrite queries to merge query operations, in an attempt to construct the query so that it has the fewest number of operations, especially SELECT operations. The following examples are provided to illustrate some of the operations that can be merged by the SQL compiler:
Using views in a SELECT statement can restrict the join order of the table and can also introduce redundant joining of tables. By merging the views during query rewrite, these restrictions can be lifted.
The use of subqueries in a SELECT statement can force a join method and the selection of inner and outer tables for the join. During query rewrite, the subquery can sometimes be merged into the main query as a join, which gives the optimizer more choices to choose the most efficient access plan.
During query rewrite redundant joins can be removed to further simplify the SELECT statement that will be optimized.
When using different functions, rewriting the query can reduce the number of calculations that need to be done.
If they exist, summary tables can be used instead of regular tables. They are of smaller size and therefore require less processing.
Suppose you have access to the following two views of the EMPLOYEE table, one showing employees with a high level of education and the other view showing employees earning more than $35,000:
CREATE VIEW EMP_EDUCATION (EMPNO, FIRSTNME, LASTNAME, EDLEVEL) AS SELECT EMPNO, FIRSTNME, LASTNAME, EDLEVEL FROM EMPLOYEE WHERE EDLEVEL > 17 CREATE VIEW EMP_SALARIES (EMPNO, FIRSTNAME, LASTNAME, SALARY) AS SELECT EMPNO, FIRSTNME, LASTNAME, SALARY FROM EMPLOYEE WHERE SALARY > 35000
Now suppose you perform the following query to list the employees who have a high education level and who are earning more than $35,000:
SELECT E1.EMPNO, E1.FIRSTNME, E1.LASTNAME, E1.EDLEVEL, E2.SALARY FROM EMP_EDUCATION E1, EMP_SALARIES E2 WHERE E1.EMPNO = E2.EMPNO
During query rewrite, these two views could be merged to create the following query:
SELECT E1.EMPNO, E1.FIRSTNME, E1.LASTNAME, E1.EDLEVEL, E2.SALARY FROM EMPLOYEE E1, EMPLOYEE E2 WHERE E1.EMPNO = E2.EMPNO AND E1.EDLEVEL > 17 AND E2.SALARY > 35000
By merging the SELECT statements from the two views with the user-written SELECT statement, the optimizer can consider more choices when selecting an access plan. In addition, if the two views that have been merged use the same base table, additional rewriting may be performed as described in Example - Redundant Join Elimination.
The SQL compiler will take a query containing a subquery, such as:
SELECT EMPNO, FIRSTNME, LASTNAME, PHONENO FROM EMPLOYEE WHERE WORKDEPT IN (SELECT DEPTNO FROM DEPARTMENT WHERE DEPTNAME = 'OPERATIONS')
and convert it to a join query of the form:
SELECT DISTINCT EMPNO, FIRSTNME, LASTNAME, PHONENO FROM EMPLOYEE EMP, DEPARTMENT DEPT WHERE EMP.WORKDEPT = DEPT.DEPTNO AND DEPT.DEPTNAME = 'OPERATIONS'
A join is generally much more efficient to execute than a subquery.
Queries can sometimes be written or generated which have unnecessary joins. Queries such as the following could also be produced during the query rewrite stage as described in Example - View Merges.
SELECT E1.EMPNO, E1.FIRSTNME, E1.LASTNAME, E1.EDLEVEL, E2.SALARY FROM EMPLOYEE E1, EMPLOYEE E2 WHERE E1.EMPNO = E2.EMPNO AND E1.EDLEVEL > 17 AND E2.SALARY > 35000
In this query, the SQL compiler can eliminate the join and simplify the query to:
SELECT EMPNO, FIRSTNME, LASTNAME, EDLEVEL, SALARY FROM EMPLOYEE WHERE EDLEVEL > 17 AND SALARY > 35000
Another example assumes that a referential constraint exists between the EMPLOYEE and DEPARTMENT sample tables on the department number. First, a view is created.
CREATE VIEW PEPLVIEW AS SELECT FIRSTNME, LASTNAME, SALARY, DEPTNO, DEPTNAME, MGRNO FROM EMPLOYEE E DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO
Then a query such as the following:
SELECT LASTNAME, SALARY FROM PEPLVIEW
becomes
SELECT LASTNAME, SALARY FROM EMPLOYEE WHERE WORKDEPT NOT NULL
Note that in this situation, even if the user knows that the query can be re-written, they may not be able to do so because they do not have access to the underlying tables. They may only have access to the view (shown above). Therefore, this type of optimization has to be performed within the database manager.
Redundancy in referential integrity joins is likely where:
For example, there are automated tools in query managers which prevent users from writing optimized queries.
Using multiple functions within a query can generate several calculations which take time. Reducing the number of calculations to be done within the query results in an improved plan. The SQL compiler takes a query using multiple functions such as:
SELECT SUM(SALARY+BONUS+COMM) AS OSUM, AVG(SALARY+BONUS+COMM) AS OAVG, COUNT(*) AS OCOUNT FROM EMPLOYEE;
and transforms the query in the following way:
SELECT OSUM, OSUM/OCOUNT OCOUNT FROM (SELECT SUM(SALARY+BONUS+COMM) AS OSUM, COUNT(*) AS OCOUNT FROM EMPLOYEE) AS SHARED_AGG;
This rewrite reduces the query from 2 sums and 2 counts to 1 sum and 1 count.
Following is an example of a multidimensional analysis that could take advantage of summary tables. A summary table is created with the sum and count of sales for each level of:
A wide range of queries can pick up their answers from this stored aggregate data. The following example calculates the sum of product group sales, by state, by month. Queries that can take advantage of such pre-computed sums would include:
While the precise answer is not included in the summary table for any of these queries, the cost of computing the answer using the summary table could be significantly less than using a large base table, because a portion of the answer is already computed. For example:
CREATE TABLE PG_SALESSUM AS ( SELECT l.id AS prodline, pg.id AS pgroup, loc.country, loc.state YEAR(pdate) AS year, MONTH(pdate) AS month, SUM(ti.amount) AS amount, COUNT(*) AS count FROM cube.transitem AS ti, cube.trans AS t, cube.loc AS loc, cube.pgroup AS pg, cube.prodline AS l WHERE ti.transid = t.id AND ti.pgid = pg.id AND pg.lineid = l.id AND t.locid = loc.id AND YEAR(pdate) > 1990 GROUP BY l.id, pg.id, loc.country, loc.state, year(pdate), month(pdate) ) DATA INITIALLY DEFERRED REFRESH DEFERRED; REFRESH TABLE SALESCUBE;
The following are sample queries that would obtain significant performance improvements because they are able to use the results in the summary table that are already computed. The first example returns the total sales for 1995 and 1996:
SET CURRENT REFRESH AGE=ANY SELECT YEAR(pdate) AS year, SUM(ti.amount) AS amount FROM cube.transitem AS ti, cube.trans AS t, cube.loc AS loc, cube.pgroup AS pg, cube.prodline AS l WHERE ti.transid = t.id AND ti.pgid = pg.id AND pg.lineid = l.id AND t.locid = loc.id AND YEAR(pdate) IN (1995, 1996) GROUP BY year(pdate);
The second example returns the total sales by product group for 1995 and 1996:
SET CURRENT REFRESH AGE=ANY SELECT pg.id AS "PRODUCT GROUP", SUM(ti.amount) AS amount FROM cube.transitem AS ti, cube.trans AS t, cube.loc AS loc, cube.pgroup AS pg, cube.prodline AS l WHERE ti.transid = t.id AND ti.pgid = pg.id AND pg.lineid = l.id AND t.locid = loc.id AND YEAR(pdate) IN (1995, 1996) GROUP BY pg.id;