beginner 7 min read

Database Design - Normalization

Written with a little help from AI, reviewed and edited by a human.

When designing a database, you need to decide how to organize your data into tables. Should everything go in one giant table? Should you split it into many small tables? Normalization is a set of rules that help you design databases that are efficient, consistent, and easy to maintain.

Real life example

Imagine you’re organizing a library. You could write all the information about each book on a single card:

Card: Harry Potter
Author: J.K. Rowling
Author Email: jk@example.com
Author Country: UK
Publisher: Bloomsbury
Publisher Address: 50 Bedford Square, London
Category: Fantasy

This works, but what if J.K. Rowling writes 50 books? You’d write her email and country 50 times. If her email changes, you’d have to update 50 cards.

Instead, you’d create separate index cards:

  • Books card: Book ID, Title, Author ID, Publisher ID, Category
  • Authors card: Author ID, Name, Email, Country
  • Publishers card: Publisher ID, Name, Address

Now each piece of information is stored once. If an author’s email changes, you update one card, not 50.

That’s what normalization does — it organizes data so you don’t repeat information unnecessarily.

What is normalization?

Normalization is the process of organizing data to:

  1. Eliminate redundancy (don’t store the same data multiple times)
  2. Ensure data integrity (changes in one place don’t leave other places out-of-date)
  3. Make updates easier (change once instead of changing many times)

There are several “levels” of normalization called normal forms (1NF, 2NF, 3NF, etc.). Each level follows stricter rules.

For most applications, reaching 3NF (Third Normal Form) is enough.

The problem: a denormalized table

Let’s say you have a table storing orders:

order_idcustomer_namecustomer_emailproduct_nameproduct_pricequantity
1Alicealice@example.comLaptop10001
2Alicealice@example.comMouse252
3Bobbob@example.comLaptop10001
4Alicealice@example.comKeyboard501

This works, but notice the problems:

  • Alice’s name and email are repeated 3 times. If her email changes, you have to update 3 rows.
  • Laptop’s price is stored twice. If the price changes, you have to update multiple rows.
  • If you delete order 1, you don’t lose Alice’s information, but if you delete orders 2 and 4 as well, Alice disappears entirely from the database even though she’s still a customer.

These issues are called anomalies, and normalization fixes them.

First Normal Form (1NF)

Rule: Each column must contain atomic (indivisible) values, and each row must be unique.

Bad example:

order_idcustomer_nameproducts
1AliceLaptop, Mouse
2BobKeyboard, Monitor

The products column contains multiple values (Laptop and Mouse). This violates 1NF.

Fixed (1NF):

order_idcustomer_nameproduct
1AliceLaptop
1AliceMouse
2BobKeyboard
2BobMonitor

Now each column contains a single value, and each row is unique.

Second Normal Form (2NF)

Rule: Must be in 1NF, and every non-key column must depend on the entire primary key, not just part of it.

This only matters when you have a composite primary key (a primary key made of multiple columns).

Bad example:

order_idproduct_idcustomer_nameproduct_name
1101AliceLaptop
2102BobMouse

The primary key is (order_id, product_id). But:

  • customer_name depends only on order_id, not product_id
  • product_name depends only on product_id, not order_id

This violates 2NF because non-key columns depend on only part of the primary key.

Fixed (2NF):

Split into separate tables:

orders

order_idcustomer_name
1Alice
2Bob

order_items

order_idproduct_id
1101
2102

products

product_idproduct_name
101Laptop
102Mouse

Now each non-key column depends on the full primary key of its table.

Third Normal Form (3NF)

Rule: Must be in 2NF, and no non-key column should depend on another non-key column (called transitive dependency).

Bad example:

order_idcustomer_namecustomer_emailcustomer_country
1Alicealice@example.comUSA
2Bobbob@example.comUK

customer_country depends on customer_name, not directly on order_id. This is a transitive dependency and violates 3NF.

Fixed (3NF):

Split into:

orders

order_idcustomer_id
110
220

customers

customer_idnameemailcountry
10Alicealice@example.comUSA
20Bobbob@example.comUK

Now customer details are stored once in the customers table, and orders reference the customer by customer_id.

Our original example, normalized

Let’s normalize the messy orders table from earlier:

Original:

order_idcustomer_namecustomer_emailproduct_nameproduct_pricequantity
1Alicealice@example.comLaptop10001
2Alicealice@example.comMouse252

Normalized (3NF):

customers

customer_idnameemail
1Alicealice@example.com
2Bobbob@example.com

products

product_idnameprice
101Laptop1000
102Mouse25
103Keyboard50

orders

order_idcustomer_idorder_date
112026-02-20
212026-02-21
322026-02-22

order_items

order_idproduct_idquantity
11011
21022
31011

Now:

  • Customer info is stored once in customers
  • Product info is stored once in products
  • Each order references a customer
  • Each order item references an order and a product

If Alice’s email changes, you update one row. If the laptop price changes, you update one row. No redundancy.

Benefits of normalization

Eliminates redundancy: Data is stored once, not repeated.

Easier updates: Change data in one place, not many places.

Data consistency: No risk of conflicting data (e.g., two different emails for the same customer).

Smaller database size: Less repeated data = less storage.

Denormalization — When to break the rules

Sometimes normalization goes too far. If you have to JOIN 10 tables to get simple data, queries can be slow.

Denormalization means intentionally adding redundancy to improve performance.

For example, you might store a customer’s name in the orders table even though it’s also in the customers table, so you don’t have to JOIN every time you want to show an order.

Use denormalization carefully, only when performance is a real problem.

Summary

  • Normalization organizes data to eliminate redundancy and ensure consistency
  • 1NF: No repeating groups, atomic values
  • 2NF: No partial dependencies (non-key columns depend on the full primary key)
  • 3NF: No transitive dependencies (non-key columns don’t depend on other non-key columns)
  • Normalized databases are easier to maintain and update
  • Sometimes denormalization is okay for performance, but use it sparingly

For most applications, aim for 3NF — it strikes a good balance between data integrity and simplicity.

Share: