Normalization helps eliminate redundancies and inconsistencies in table data. It is the process of reducing tables to a set of columns where all the non-key columns depend on the primary key column. If this is not the case, the data can become inconsistent during updates.
This section briefly reviews the rules for first, second, third, and fourth normal form. The fifth normal form of a table, which is covered in many books on database design, is not described here.
A table is in first normal form if there is only one value, never a set of values, in each cell. A table that is in first normal form does not necessarily satisfy the criteria for higher normal forms.
For example, the following table violates first normal form because the
WAREHOUSE column contains several values for each occurrence of PART.
Table 8. Table Violating First Normal Form
PART (Primary Key) | WAREHOUSE |
---|---|
P0010 | Warehouse A, Warehouse B, Warehouse C |
P0020 | Warehouse B, Warehouse D |
The following example shows the same table in first normal form.
Table 9. Table Conforming to First Normal Form
PART (Primary Key) | WAREHOUSE (Primary Key) | QUANTITY |
---|---|---|
P0010 | Warehouse A | 400 |
P0010 | Warehouse B | 543 |
P0010 | Warehouse C | 329 |
P0020 | Warehouse B | 200 |
P0020 | Warehouse D | 278 |
A table is in second normal form if each column that is not part of the key is dependent upon the entire key.
Second normal form is violated when a non-key column is dependent upon
part of a composite key, as in the following example:
Table 10. Table Violating Second Normal Form
PART (Primary Key) | WAREHOUSE (Primary Key) | QUANTITY | WAREHOUSE_ADDRESS |
---|---|---|---|
P0010 | Warehouse A | 400 | 1608 New Field Road |
P0010 | Warehouse B | 543 | 4141 Greenway Drive |
P0010 | Warehouse C | 329 | 171 Pine Lane |
P0020 | Warehouse B | 200 | 4141 Greenway Drive |
P0020 | Warehouse D | 278 | 800 Massey Street |
The primary key is a composite key, consisting of the PART and the WAREHOUSE columns together. Because the WAREHOUSE_ADDRESS column depends only on the value of WAREHOUSE, the table violates the rule for second normal form.
The problems with this design are:
The solution is to split the table into the following two tables:
Table 11. PART_STOCK Table Conforming to Second Normal Form
PART (Primary Key) | WAREHOUSE (Primary Key) | QUANTITY |
---|---|---|
P0010 | Warehouse A | 400 |
P0010 | Warehouse B | 543 |
P0010 | Warehouse C | 329 |
P0020 | Warehouse B | 200 |
P0020 | Warehouse D | 278 |
Table 12. WAREHOUSE Table Conforms to Second Normal Form
WAREHOUSE (Primary Key) | WAREHOUSE_ADDRESS |
---|---|
Warehouse A | 1608 New Field Road |
Warehouse B | 4141 Greenway Drive |
Warehouse C | 171 Pine Lane |
Warehouse D | 800 Massey Street |
There is a performance consideration in having the two tables in second normal form. Applications that produce reports on the location of parts must join both tables to retrieve the relevant information.
To better understand performance considerations, refer to "Tuning Application Performance" in the Administration Guide: Performance.
A table is in third normal form if each non-key column is independent of other non-key columns, and is dependent only on the key.
The first table in the following example contains the columns EMPNO and
WORKDEPT. Suppose a column DEPTNAME is added (see Table 14). The new column depends on WORKDEPT, but the primary
key is EMPNO. The table now violates third normal form. Changing
DEPTNAME for a single employee, John Parker, does not change the department
name for other employees in that department. Note that there are now
two different department names used for department number E11. The
inconsistency that results is shown in the updated version of the
table.
Table 13. Unnormalized EMPLOYEE_DEPARTMENT Table Before Update
EMPNO (Primary Key) | FIRSTNAME | LASTNAME | WORKDEPT | DEPTNAME |
---|---|---|---|---|
000290 | John | Parker | E11 | Operations |
000320 | Ramlal | Mehta | E21 | Software Support |
000310 | Maude | Setright | E11 | Operations |
Table 14. Unnormalized EMPLOYEE_DEPARTMENT Table After Update
Information in the table has become inconsistent. | ||||
EMPNO (Primary Key) | FIRSTNAME | LASTNAME | WORKDEPT | DEPTNAME |
---|---|---|---|---|
000290 | John | Parker | E11 | Installation Mgmt |
000320 | Ramlal | Mehta | E21 | Software Support |
000310 | Maude | Setright | E11 | Operations |
The table can be normalized by creating a new table, with columns for WORKDEPT and DEPTNAME. An update like changing a department name is now much easier; only the new table needs to be updated.
An SQL query that returns the department name along with the employee name is more complex to write, because it requires joining the two tables. It will probably also take longer to run than a query on a single table. Additional storage space is required, because the WORKDEPT column must appear in both tables.
The following tables are defined as a result of normalization:
Table 15. EMPLOYEE Table After Normalizing the EMPLOYEE_DEPARTMENT Table
EMPNO (Primary Key) | FIRSTNAME | LASTNAME | WORKDEPT |
---|---|---|---|
000290 | John | Parker | E11 |
000320 | Ramlal | Mehta | E21 |
000310 | Maude | Setright | E11 |
Table 16. DEPARTMENT Table After Normalizing the EMPLOYEE_DEPARTMENT Table
DEPTNO (Primary Key) | DEPTNAME |
---|---|
E11 | Operations |
E21 | Software Support |
A table is in fourth normal form if no row contains two or more independent multi-valued facts about an entity.
Consider these entities: employees, skills, and languages. An
employee can have several skills and know several languages. There are
two relationships, one between employees and skills, and one between employees
and languages. A table is not in fourth normal form if it represents
both relationships, as in the following example:
Table 17. Table Violating Fourth Normal Form
EMPNO (Primary Key) | SKILL (Primary Key) | LANGUAGE (Primary Key) |
---|---|---|
000130 | Data Modelling | English |
000130 | Database Design | English |
000130 | Application Design | English |
000130 | Data Modelling | Spanish |
000130 | Database Design | Spanish |
000130 | Application Design | Spanish |
Instead, the relationships should be represented in two tables:
Table 18. EMPLOYEE_SKILL Table Conforming to Fourth Normal Form
EMPNO (Primary Key) | SKILL (Primary Key) |
---|---|
000130 | Data Modelling |
000130 | Database Design |
000130 | Application Design |
Table 19. EMPLOYEE_LANGUAGE Table Conforming to Fourth Normal Form
EMPNO (Primary Key) | LANGUAGE (Primary Key) |
---|---|
000130 | English |
000130 | Spanish |
If, however, the attributes are interdependent (that is, the employee applies certain languages only to certain skills), the table should not be split.
A good strategy when designing a database is to arrange all data in tables that are in fourth normal form, and then to decide whether the results give you an acceptable level of performance. If they do not, you can rearrange the data in tables that are in third normal form, and then reassess performance.