Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. There are several benefits for using Normalization in Database.
- Eliminate data redundancy
- Improve performance
- Query optimization
- Faster update due to less number of columns in one table
- Index improvement
The Normal Forms
The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you’ll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won’t be discussed in this article.
First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:
- Eliminate duplicative columns from the same table
- Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Figure 1: First Normal Form(1NF)
Second Normal Form (2NF)
Second normal form (2NF) further addresses the concept of removing duplicative data:
- Meet all the requirements of the first normal form.
- Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
- Create relationships between these new tables and their predecessors through the use of foreign keys.
Figure 2: Second Normal Form(2NF)
Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
- Meet all the requirements of the second normal form.
- Remove columns that are not dependent upon the primary key.
Fourth Normal Form (4NF)
Fourth normal form (4NF) has one additional requirement:
- Meet all the requirements of the third normal form.
- A relation is in 4NF if it has no multi-valued dependencies. i.e. If PK is composed of multiple columns then all non-key attributes should be derived from FULL PK only. If some non-key attribute can be derived from partial PK then remove it.
The 4NF also known as BCNF NF.
Figure 3: Fourth Normal Form(4NF)
Your data model design is both an art and a science. Balance what works best to support the application that will use the database and to store data in an efficient and structured manner. For transaction-based systems, a highly normalized database design is the way to go; it ensures consistent data throughout the entire database and that it is performing well. For reporting-based systems, a less normalized database is usually the best approach. You will eliminate the need to join a lot of tables and queries will be faster. Plus, the database will be much more user friendly for ad hoc reporting needs.