The topic of normalizing tables draws much attention in database design. Normalization helps you avoid redundancies and inconsistencies in your data. The main idea in normalization is to reduce tables to a set of columns where all the non-key columns depend on the entire primary key of the table. If this is not the case, the data can become inconsistent during updating.
This section briefly reviews the rules for first, second, third, and fourth normal forms of tables, and describes some reasons why they should or should not be followed. The fifth normal form of a table, which is covered in many books on database design, is not described here.
Here are brief descriptions of the normal forms presented later:
A table satisfies the requirement of first normal form if for each row-and-column position in the table there exists one value, never a set of values. A table that is in first normal form does not necessarily meet the test for higher normal forms.
For example, the following table violates first normal form because the
WAREHOUSE column contains several values for each occurrence of PART.
Table 6. Table Violating First Normal Form
PART (Primary Key) | WAREHOUSE |
---|---|
P0010 | Warehouse A, Warehouse B, Warehouse C |
P0020 | Warehouse B, Warehouse D |
The following example shows the table in first normal form.
Table 7. Table Conforms to First Normal Form
PART (Primary Key) | WAREHOUSE (Primary Key) | QUANTITY |
---|---|---|
P0010 | Warehouse A | 400 |
P0010 | Warehouse B | 543 |
P0010 | Warehouse C | 329 |
P0020 | Warehouse B | 200 |
P0020 | Warehouse D | 278 |
A table is in second normal form if each column that is not in the key provides a fact that depends on the entire key.
This means that all data that is not part of the primary key must depend on all of the columns in the key. This reduces repetition among database tables.
Second normal form is violated when a non-key column is a fact about a
subset of a composite key, as in the following example. An inventory
table records quantities of specific parts stored at particular warehouses;
its columns are shown in the following example.
Table 8. Table Violates Second Normal Form
PART (Primary Key) | WAREHOUSE (Primary Key) | QUANTITY | WAREHOUSE_ADDRESS |
---|---|---|---|
P0010 | Warehouse A | 400 | 1608 New Field Road |
P0010 | Warehouse B | 543 | 4141 Greenway Drive |
P0010 | Warehouse C | 329 | 171 Pine Lane |
P0020 | Warehouse B | 200 | 4141 Greenway Drive |
P0020 | Warehouse D | 278 | 800 Massey Street |
Here, the key consists of the PART and the WAREHOUSE columns together. Because the column WAREHOUSE_ADDRESS depends only on the value of WAREHOUSE, the table violates the rule for second normal form.
The problems with this design are:
To satisfy second normal form, the information shown above, in Table 8, would be split into the following two tables:
Table 9. Part-Stock Table Conforms to Second Normal Form
PART (Primary Key) | WAREHOUSE (Primary Key) | QUANTITY |
---|---|---|
P0010 | Warehouse A | 400 |
P0010 | Warehouse B | 543 |
P0010 | Warehouse C | 329 |
P0020 | Warehouse B | 200 |
P0020 | Warehouse D | 278 |
Table 10. Warehouse Table Conforms to Second Normal Form
WAREHOUSE (Primary Key) | WAREHOUSE_ADDRESS |
---|---|
Warehouse A | 1608 New Field Road |
Warehouse B | 4141 Greenway Drive |
Warehouse C | 171 Pine Lane |
Warehouse D | 800 Massey Street |
However, there is a performance consideration in having the two tables in second normal form. Application programs that produce reports on the location of parts must join both tables to retrieve the relevant information.
To better understand performance considerations, refer to "Tuning Application Performance" in the Administration Guide, Performance.
A table is in third normal form if each non-key column provides a fact that is independent of other non-key columns and depends only on the key.
Third normal form is violated when a non-key column is a fact about another non-key column. For example, the first table in the following example contains the columns EMPNO and WORKDEPT. Suppose a column DEPTNAME is added. The new column depends on WORKDEPT, whereas the primary key is the column EMPNO; thus the table now violates third normal form.
Changing DEPTNAME for a single employee, John Parker, does not change the
department name for other employees in that department. The
inconsistency that results is shown in the updated version of the table in the
following example.
Table 11. Unnormalized Employee-Department Table Before Update
EMPNO (Primary Key) | FIRSTNAME | LASTNAME | WORKDEPT | DEPTNAME |
---|---|---|---|---|
000290 | John | Parker | E11 | Operations |
000320 | Ramlal | Mehta | E21 | Software Support |
000310 | Maude | Setright | E11 | Operations |
The following example shows the content of the table following an update to
the DEPTNAME column for John Parker. Note that there are now two
different department names used for department number (WORKDEPT) E11:
Table 12. Unnormalized Employee-Department Table After Update
Information in table has become inconsistent. | ||||
EMPNO (Primary Key) | FIRSTNAME | LASTNAME | WORKDEPT | DEPTNAME |
---|---|---|---|---|
000290 | John | Parker | E11 | Installation Mgmt |
000320 | Ramlal | Mehta | E21 | Software Support |
000310 | Maude | Setright | E11 | Operations |
The table can be normalized by providing a new table, with columns for
WORKDEPT and DEPTNAME. In that case, an update like changing a
department name is much easier--the update only has to be made to the new
table. An SQL query that shows the department name along with the
employee name is more complex to write because it requires joining the two
tables. This query will probably also take longer to execute than the
query of a single table. In addition, the entire arrangement takes more
storage space because the WORKDEPT column must appear in both tables.
The following tables are defined as a result of normalizing EMPDEPT.
Table 13. Employee Table After Normalizing the Employee-Department Table
EMPNO (Primary Key) | FIRSTNAME | LASTNAME | WORKDEPT |
---|---|---|---|
000290 | John | Parker | E11 |
000320 | Ramlal | Mehta | E21 |
000310 | Maude | Setright | E11 |
Table 14. Department Table After Normalizing the Employee-Department Table
DEPTNO (Primary Key) | DEPTNAME |
---|---|
E11 | Operations |
E21 | Software Support |
A table is in fourth normal form if no row contains two or more independent multi-valued facts about an entity.
Consider these entities: employees, skills, and languages. An
employee can have several skills and know several languages. There are
two relationships, one between employees and skills, and one between employees
and languages. A table is not in fourth normal form if it represents
both relationships, as in the following example:
Table 15. Table Violating Fourth Normal Form
EMPNO (Primary Key) | SKILL (Primary Key) | LANGUAGE (Primary Key) |
---|---|---|
000130 | Data Modelling | English |
000130 | Database Design | English |
000130 | Application Design | English |
000130 | Data Modelling | Spanish |
000130 | Database Design | Spanish |
000130 | Application Design | Spanish |
Instead, the relationships should be represented in two tables, as in the
following examples.
Table 16. Employee-Skill Table in Fourth Normal Form
EMPNO (Primary Key) | SKILL (Primary Key) |
---|---|
000130 | Data Modelling |
000130 | Database Design |
000130 | Application Design |
Table 17. Employee-Language Table in Fourth Normal Form
EMPNO (Primary Key) | LANGUAGE (Primary Key) |
---|---|
000130 | English |
000130 | Spanish |
If, however, the facts are interdependent--that is, the employee applies certain languages only to certain skills--then the table should not be split.
Any data can be put into fourth normal form. A good rule when designing a database is to arrange all data in tables in fourth normal form, and then decide whether the result gives you an acceptable level of performance. If it does not, you are at liberty to denormalize your design.