Home »
DBMS
Normalization in Database Management System
DBMS | Normalization: In this tutorial, we will learn about the normalization, different types of the normalization with the examples.
By IncludeHelp Last updated : May 29, 2023
What is Normalization in DBMS?
Normalization is a process of organizing the data in a database to avoid data redundancy and improve data integrity. Normalization of the database is a process of arranging the data in the database. To remove data repetition (redundancy) and unwanted features such as addition, Update, and deletion anomalies, and a normalization is a systematic approach to decomposing tables. It is a multi-step approach that places information in tabular form, eliminating redundant information from the tables.
What happens if the database table is not Normalize?
If a table is not normalized properly and has data redundancy, it not only requires additional memory space but also makes it very difficult to maintain and Update the database records without data loss. If the database is not normalized, insertion, Updation, and anomalies deletion will be difficult.
Purpose of Normalization
Normalization is primarily used for two reasons.
- To remove redundant data.
- To ensure data dependencies, i.e. data is logically processed and making sense.
Types of DBMS Normalization
It is possible to classify databases by their normalization level, from level 1 to level 5. This means that the easiest and most basic method of normalizing data is level 1 (First Normal Form or 1NF), up to 5NF, the most complex one.
The most common form of Normalization is as follows -
- First Normal Form
- Second Normal Form
- Third Normal Form
- BCNF
- Fourth Normal Form
1) First Normal Form (1NF)
There are some criteria to keep the database table in first normal form. These are as follows -
- A table will be in 1NF, if it contains an atomic value.
- It needs to note that a table attribute is unable to hold several values. It must only contain an attribute with a single value.
- The multi-value attribute, the composite attribute, and their combinations are disallowed by the first normal type.
Example: Table STUDENT is not in 1NF because of multi-valued attribute ST_PHONE.
STUDENT table:
ST_ID |
ST_NAME |
ST_PHONE |
ST_STATE |
101 | Ram | 8877665544, 8989265471 | MP |
102 | Rex | 8574783832 | Maharashtra |
103 | Kapil | 7390372389, 8589830302 | Delhi |
The decomposition of the STUDENT table into 1NF has been shown below:
ST_ID |
ST_NAME |
ST_PHONE |
ST_STATE |
101 | Ram | 8877665544 | MP |
101 | Ram | 8989265471 | MP |
102 | Rex | 8574783832 | Maharashtra |
103 | Kapil | 7390372389 | Delhi |
103 | Kapil | 8589830302 | Delhi |
2) Second Normal Form (2NF)
There are several conditions for keeping the second normal shape of the database table. These are the following -
- In the 2NF row, the table must be 1NF.
- In the second normal type, all non-key attributes depending on the primary key are completely functional.
Example: let's we assume that teachers' data and the subjects they teaches can be stored by a school. A teacher can teach more than one subject in a school.
TEACHER table:
TEACHER_ID | SUBJECT | TEACHER_AGE |
T1 | Hindi | 45 |
T1 | Science | 45 |
T2 | Maths | 35 |
T3 | English | 28 |
T3 | SST | 28 |
The non-prime attribute TEACHER AGE in the provided table is based on TEACHER ID, which is the right subset of a candidate key. That is why it violates the 2NF law.
We split it down into two tables to translate the given table into 2NF:
TEACHER_ID | TEACHER_AGE |
T1 | 45 |
T2 | 35 |
T3 | 28 |
TEACHER_SUBJECT table:
TEACHER_ID | SUBJECT |
T1 | Hindi |
T1 | Science |
T2 | Math |
T3 | English |
T3 | SST |
3) Third Normal Form (3NF)
- In the 3NF, the table must be 2NF.
- There should no transitive dependency for non-prime attributes
If at least one of the following conditions holds X->Y in any non trivial function dependency, a Relationship is in 3NF:
- X is a Super key.
- Y is a primary attribute.
If A->B and B->C are two FDs then A->C is called transitive dependency.
The normalization of the relationship of 2NF with 3NF requires the elimination of transitive dependencies. If there is a transitive dependence, by putting the attribute(s) in a new relation along with a copy of the determinant, we delete the transitively dependent attribute(s) from the relation.
Consider the examples given below.
Example-1:
In relation TEACHER given in following table,
TEACHER_ID | TEACHER_NAME | TEACHER_ZIP_CODE | TEACHER_CITY |
101 | Rex | Z1 | Noida |
102 | Rohan | Z2 | Boston |
103 | Puneet | Z3 | Chicago |
104 | Jack | Z4 | Norwich |
105 | Amit | Z5 | Bhopal |
Super key in the table above:
{TEACHER_ID}, {TEACHER_ID, TEACHER_NAME}, {TEACHER_ID, TEACHER_NAME, TEACHER_ZIP_CODE}....so on
Candidate key: {TEACHER_ID}
Non-prime attributes: In the given table, all attributes except TEACHER_ID are non-prime.
Here, TEACHER_STATE & TEACHER_CITY dependent on TEACHER_ZIP_CODE and TEACHER_ZIP_CODE dependent on TEACHER_ID. The non-prime attributes (TEACHER_STATE, TEACHER_CITY) transitively dependent on super key (TEACHER_ID). It violates the rule of third normal form.
That's why we need to move the TEACHER_CITY and TEACHER_STATE to the new <TEACHER_ZIP> table, with TEACHER_ZIP_CODE as a Primary key
TEACHER table:
TEACHER_ID | TEACHER_NAME | TEACHER_ZIP_CODE |
101 | Rex | Z1 |
102 | Rohan | Z2 |
103 | Puneet | Z3 |
104 | Jack | Z4 |
105 | Amit | Z5 |
TEACHER_ZIP table:
TEACHER_ZIP_CODE | TEACHER_CITY |
Z1 | Noida |
Z2 | Boston |
Z3 | Chicago |
Z4 | Norwich |
Z5 | Bhopal |
4) Boyce Codd normal form (BCNF)
There are several conditions for keeping the Boyce Codd normal form which forms of the database table. These are the following -
- The modern variant of 3NF is the BCNF. Stricter than 3NF, it is.
- If each functional dependency X ->Y, X is the upper key of the table, a table is in BCNF.
- The table should be in 3NFF for BCNF.
Example: Let's presume that there is a business in which learners work in more than one department.
STUDENT table:
ST_ID | ST_COUNTRY | ST_SPECIALISATION | DEPT_TYPE | ST_SPECIALISATION_NO |
101 | India | Engineering | D101 | S1 |
101 | India | IT | D101 | S2 |
102 | UK | Architecture | D201 | S3 |
102 | UK | Maths | D201 | S4 |
In the above table Functional dependencies are as follows:
- ST_ID → ST_COUNTRY
- ST_SPECIALISATION → {DEPT_TYPE, ST_SPECIALISATION_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither ST_SPECIALISATION nor ST_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
ST_COUNTRY table:
ST_ID | ST_COUNTRY |
101 | India |
101 | India |
ST_SPECIALISATION table:
ST_SPECIALISATION |
DEPT_TYPE |
ST_SPECIALISATION_NO |
Engineering | D101 | S1 |
IT | D101 | S2 |
Architecture | D201 | S3 |
Maths | D201 | S4 |
ST_SPECIALISATION_MAPPING table:
ST_ID | ST_SPECIALISATION |
D101 | S1 |
D101 | S2 |
D201 | S3 |
D201 | S4 |
Functional dependencies:
- ST_ID → ST_COUNTRY
- ST_SPECIALISATION → {DEPT_TYPE, ST_SPECIALISATION_NO}
Candidate keys:
For the first table: ST_ID
For the second table: ST_SPECIALISATION
For the third table: {ST_ID, ST_SPECIALISATION}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
5) Fourth normal form (4NF)
There are several conditions for keeping the Fourth normal form which forms of the database table. These are the following -
- If it is in Boyce Codd Normal Form and has no multi-value dependence, a table will be in 4NF.
- For dependency A -> B, if multiple values of B exist for a single value of A, then the table would be a multi-valued dependency.
Example
STUDENT
STU_ID | COURSE | HOBBY |
101 | Maths | Dancing |
101 | Physics | Singing |
102 | Chemistry | Dancing |
103 | Science | Cricket |
104 | English | Hockey |
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no database between COURSE and HOBBY.
In the STUDENT table, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID | COURSE |
101 | Maths |
101 | Physics |
102 | Chemistry |
103 | Science |
104 | English |
STUDENT_HOBBY:
STU_ID | HOBBY |
101 | Dancing |
101 | Singing |
102 | Dancing |
103 | Cricket |
104 | Hockey |
6) Fifth normal form (5NF)
There are several conditions for keeping the Fifth normal form which forms of the database table. These are the following -
- If it is in 4NF and does not have any join dependence, a table is in 5NF and joining should be lossless.
- 5NF is fulfilled when in order to prevent duplication, all tables are broken into as many tables as possible.
- 5NF is often referred to as the normal form of Project-join (PJ/NF).
Example:
SUBJECT | LECTURER | SEMESTER |
Physics | Mr. Kabir | I |
Physics | Mrs. Hemlata | II |
Chemistry | Mr. Ramkumar | I |
Chemistry | Mr. Amar Singh | I |
Math | Mr. Lal K. | II |
In the table above for semester 1, Ram takes both computer and math classes, but for semester 2, he does not take math classes. In this case, all these fields must be combined to classify valid data.
We add a new Semester as Semester 3 to suppose but do not know about the topic and who will take that topic so that we leave Lecturer and Subject as NULL. But together all three columns serve as the primary key, so we can't leave two more columns blank.
So we can decompose the table above into 3 tables Table 1, Table 2 & Table 3 to transform the table above into 5NF:
Table 1:
SEMESTER | SUBJECT |
I | Physics |
II | Physics |
I | Chemistry |
II | Math |
Table 2:
SUBJECT | LECTURER |
Physics | Mr. Kabir |
Physics | Mrs. Hemlata |
Chemistry | Mr. Ramkumar |
Chemistry | Mr. Amar Singh |
Math | Mr. Lal K. |
Table 3:
SEMSTER | LECTURER |
I | Mr. Kabir |
II | Mrs. Hemlata |
I | Mr. Ramkumar |
I | Mr. Amar Singh |
II | Mr. Lal K. |
How do we tackle the actual process of database normalization?
This is a very general process, the details of are as follows:
Step 1: For each value, build separate tables, or locate duplicate fields in a table, position them in different tables, and connect a key to each of them.
Step 2: In different tables, create relations between values. For example, between the Colors table and the Sizes table for a clothing product.
Step 3: Introduce relations between the columns of the main key and the columns of the non-key.
Advantages of Normalization of Databases
Followings are the key advantages of Normalization of database,
- A logical map:
The collection and mapping of data is more logically organized and thus its value for every department using the tables is doubled.
- Data consistency:
The reliability of the data is improved, and the information stored would be more reliable. It avoids the following errors: storing outdated versions, finding duplicated data in different sections of your business, and finding different types of links between product data without a clear hierarchy.
- Sound Data Relation:
The creation of a full data structure that demonstrates how information from various tables relates to each other is showing the relational data. The identification of data links and the correction of any inaccessibility or inconsistency of information inside the product database will be facilitated.