Database Design - Normalization

Written with a little help from AI, reviewed and edited by a human.

When designing a database, you need to decide how to organize your data into tables. Should everything go in one giant table? Should you split it into many small tables? Normalization is a set of rules that help you design databases that are efficient, consistent, and easy to maintain.

Real life example

Imagine you’re organizing a library. You could write all the information about each book on a single card:

Card: Harry Potter
Author: J.K. Rowling
Author Email: jk@example.com
Author Country: UK
Publisher: Bloomsbury
Publisher Address: 50 Bedford Square, London
Category: Fantasy

This works, but what if J.K. Rowling writes 50 books? You’d write her email and country 50 times. If her email changes, you’d have to update 50 cards.

Instead, you’d create separate index cards:

Books card: Book ID, Title, Author ID, Publisher ID, Category
Authors card: Author ID, Name, Email, Country
Publishers card: Publisher ID, Name, Address

Now each piece of information is stored once. If an author’s email changes, you update one card, not 50.

That’s what normalization does — it organizes data so you don’t repeat information unnecessarily.

What is normalization?

Normalization is the process of organizing data to:

Eliminate redundancy (don’t store the same data multiple times)
Ensure data integrity (changes in one place don’t leave other places out-of-date)
Make updates easier (change once instead of changing many times)

There are several “levels” of normalization called normal forms (1NF, 2NF, 3NF, etc.). Each level follows stricter rules.

For most applications, reaching 3NF (Third Normal Form) is enough.

The problem: a denormalized table

Let’s say you have a table storing orders:

order_id	customer_name	customer_email	product_name	product_price	quantity
1	Alice	alice@example.com	Laptop	1000	1
2	Alice	alice@example.com	Mouse	25	2
3	Bob	bob@example.com	Laptop	1000	1
4	Alice	alice@example.com	Keyboard	50	1

This works, but notice the problems:

Alice’s name and email are repeated 3 times. If her email changes, you have to update 3 rows.
Laptop’s price is stored twice. If the price changes, you have to update multiple rows.
If you delete order 1, you don’t lose Alice’s information, but if you delete orders 2 and 4 as well, Alice disappears entirely from the database even though she’s still a customer.

These issues are called anomalies, and normalization fixes them.

First Normal Form (1NF)

Rule: Each column must contain atomic (indivisible) values, and each row must be unique.

Bad example:

order_id	customer_name	products
1	Alice	Laptop, Mouse
2	Bob	Keyboard, Monitor

The products column contains multiple values (Laptop and Mouse). This violates 1NF.

Fixed (1NF):

order_id	customer_name	product
1	Alice	Laptop
1	Alice	Mouse
2	Bob	Keyboard
2	Bob	Monitor

Now each column contains a single value, and each row is unique.

Second Normal Form (2NF)

Rule: Must be in 1NF, and every non-key column must depend on the entire primary key, not just part of it.

This only matters when you have a composite primary key (a primary key made of multiple columns).

Bad example:

order_id	product_id	customer_name	product_name
1	101	Alice	Laptop
2	102	Bob	Mouse

The primary key is (order_id, product_id). But:

customer_name depends only on order_id, not product_id
product_name depends only on product_id, not order_id

This violates 2NF because non-key columns depend on only part of the primary key.

Fixed (2NF):

Split into separate tables:

orders

order_id	customer_name
1	Alice
2	Bob

order_items

order_id	product_id
1	101
2	102

products

product_id	product_name
101	Laptop
102	Mouse

Now each non-key column depends on the full primary key of its table.

Third Normal Form (3NF)

Rule: Must be in 2NF, and no non-key column should depend on another non-key column (called transitive dependency).

Bad example:

order_id	customer_name	customer_email	customer_country
1	Alice	alice@example.com	USA
2	Bob	bob@example.com	UK

customer_country depends on customer_name, not directly on order_id. This is a transitive dependency and violates 3NF.

Fixed (3NF):

Split into:

orders

order_id	customer_id
1	10
2	20

customers

customer_id	name	email	country
10	Alice	alice@example.com	USA
20	Bob	bob@example.com	UK

Now customer details are stored once in the customers table, and orders reference the customer by customer_id.

Our original example, normalized

Let’s normalize the messy orders table from earlier:

Original:

order_id	customer_name	customer_email	product_name	product_price	quantity
1	Alice	alice@example.com	Laptop	1000	1
2	Alice	alice@example.com	Mouse	25	2

Normalized (3NF):

customers

customer_id	name	email
1	Alice	alice@example.com
2	Bob	bob@example.com

products

product_id	name	price
101	Laptop	1000
102	Mouse	25
103	Keyboard	50

orders

order_id	customer_id	order_date
1	1	2026-02-20
2	1	2026-02-21
3	2	2026-02-22

order_items

order_id	product_id	quantity
1	101	1
2	102	2
3	101	1

Now:

Customer info is stored once in customers
Product info is stored once in products
Each order references a customer
Each order item references an order and a product

If Alice’s email changes, you update one row. If the laptop price changes, you update one row. No redundancy.

Benefits of normalization

Eliminates redundancy: Data is stored once, not repeated.

Easier updates: Change data in one place, not many places.

Data consistency: No risk of conflicting data (e.g., two different emails for the same customer).

Smaller database size: Less repeated data = less storage.

Denormalization — When to break the rules

Sometimes normalization goes too far. If you have to JOIN 10 tables to get simple data, queries can be slow.

Denormalization means intentionally adding redundancy to improve performance.

For example, you might store a customer’s name in the orders table even though it’s also in the customers table, so you don’t have to JOIN every time you want to show an order.

Use denormalization carefully, only when performance is a real problem.

Summary

Normalization organizes data to eliminate redundancy and ensure consistency
1NF: No repeating groups, atomic values
2NF: No partial dependencies (non-key columns depend on the full primary key)
3NF: No transitive dependencies (non-key columns don’t depend on other non-key columns)
Normalized databases are easier to maintain and update
Sometimes denormalization is okay for performance, but use it sparingly

For most applications, aim for 3NF — it strikes a good balance between data integrity and simplicity.