Foreign Keys and Referential Integrity

Written with a little help from AI, reviewed and edited by a human.

In a previous post we learned about JOINs — how to combine data from two tables using a common column. But there’s a problem: nothing stops you from creating “orphan” data — like a post that references a user who doesn’t exist. Foreign keys solve this by enforcing rules about how tables relate to each other.

The problem

Let’s say you have two tables:

users

id	name
1	Alice
2	Bob

posts

id	title	user_id
1	My first post	1
2	Hello world	1
3	Learning SQL	2

The user_id column in posts references the id column in users. This is how we know which user wrote which post.

But nothing stops you from doing this:

INSERT INTO posts (title, user_id)
VALUES ('Mystery post', 999);

Now you have a post that says it was written by user 999 — but user 999 doesn’t exist in the users table. This is bad data. When you try to JOIN the tables, this post will either be missing or show up with a NULL user.

Real life example

Imagine you’re organizing a school project. You have:

A list of students (each with a student ID)
A list of project submissions (each listing which student ID submitted it)

If someone submits a project and writes down student ID “999” but there’s no student with that ID, you have a problem. You can’t grade it, you can’t contact the student, and you don’t know who to give credit to.

A foreign key is like a rule that says: “Before you accept a submission, check that the student ID actually exists in the student list. If it doesn’t, reject the submission.”

What is a foreign key?

A foreign key is a column (or set of columns) in one table that references the primary key of another table. It creates a link between the two tables and enforces a rule: “You can only insert values here that exist in the other table.”

In our example:

posts.user_id is a foreign key
It references users.id (the primary key of the users table)

Creating a foreign key

When you create a table, you can define a foreign key like this:

CREATE TABLE posts (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  title VARCHAR(255) NOT NULL,
  user_id INTEGER NOT NULL,
  FOREIGN KEY (user_id) REFERENCES users(id)
);

The line FOREIGN KEY (user_id) REFERENCES users(id) says:

user_id in this table must match an id in the users table

Now if you try to insert a post with a non-existent user:

INSERT INTO posts (title, user_id)
VALUES ('Mystery post', 999);

You’ll get an error: “Foreign key constraint failed” or something similar. The database won’t let you insert bad data.

What happens when you delete a user?

Here’s where it gets interesting. Let’s say Alice (user id 1) wrote 5 posts. Then you delete Alice from the users table:

DELETE FROM users WHERE id = 1;

What happens to her posts?

By default, the database will reject the deletion and give you an error. Why? Because deleting Alice would leave her posts “orphaned” — they’d reference a user that no longer exists, which violates the foreign key rule.

But you can tell the database what to do in this situation using ON DELETE rules.

ON DELETE rules

When defining a foreign key, you can specify what happens when the referenced row is deleted:

1. ON DELETE CASCADE

Delete the posts too when the user is deleted:

CREATE TABLE posts (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  title VARCHAR(255) NOT NULL,
  user_id INTEGER NOT NULL,
  FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

Now if you delete Alice, all of her posts are automatically deleted too.

Use this when: Child data doesn’t make sense without the parent (like deleting an order should delete all items in that order).

2. ON DELETE SET NULL

Set the user_id to NULL when the user is deleted:

CREATE TABLE posts (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  title VARCHAR(255) NOT NULL,
  user_id INTEGER,  -- Notice: NOT NULL is removed
  FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
);

Now if you delete Alice, her posts remain but user_id becomes NULL (meaning “author unknown”).

Use this when: You want to keep the child data even if the parent is gone.

3. ON DELETE RESTRICT (default)

Prevent deletion if there are related rows:

FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT

This is usually the default. If Alice has posts, you cannot delete Alice. You must delete her posts first.

Use this when: You want to be very careful and explicit about deletions.

4. ON DELETE NO ACTION

Similar to RESTRICT but checks the constraint at the end of the transaction instead of immediately.

ON UPDATE rules

You can also specify what happens when the referenced key is updated:

FOREIGN KEY (user_id) REFERENCES users(id)
  ON DELETE CASCADE
  ON UPDATE CASCADE

ON UPDATE CASCADE means: If you change a user’s id, automatically update all posts that reference it.

In practice, you rarely change primary keys, so ON UPDATE is less commonly used than ON DELETE.

Adding a foreign key to an existing table

If you already have a table and want to add a foreign key, use ALTER TABLE:

ALTER TABLE posts
ADD FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE;

Note: This will fail if you already have data that violates the constraint (like posts referencing non-existent users). You’ll need to clean up the bad data first.

Why use foreign keys?

Data integrity: Foreign keys prevent you from creating invalid relationships. You can’t have a post by user 999 if user 999 doesn’t exist.

Automatic cleanup: With ON DELETE CASCADE, deleting a user automatically cleans up their related data. No orphaned records.

Documentation: Foreign keys make it clear how tables relate to each other. Anyone looking at your schema can see the relationships.

Safety: Foreign keys protect you from accidentally creating a mess in your database.

When to use what

Use ON DELETE CASCADE when child data is meaningless without the parent (order items, comments on a deleted post)
Use ON DELETE SET NULL when you want to keep the child data but mark it as “unlinked” (posts by deleted users)
Use ON DELETE RESTRICT when you want to force manual cleanup before deletion (prevent accidents)

Most of the time, CASCADE is the right choice for one-to-many relationships where the child data truly belongs to the parent.