beginner 5 min read

Removing Duplicates with DISTINCT

Written with a little help from AI, reviewed and edited by a human.

Sometimes when you query a database, you get duplicate values in the results. For example, if you want to see all the cities where your users live, the same city might appear multiple times if multiple users live there. DISTINCT removes those duplicates and shows each unique value only once.

The problem

Let’s say we have a table of students and which city they’re from:

idnamecity
1AliceNew York
2BobLondon
3CharlieNew York
4DianaTokyo
5EveLondon
6FrankNew York

Now imagine you want to know which cities are represented in your student list. If you just do:

SELECT city FROM students;

Result:

city
New York
London
New York
Tokyo
London
New York

You get 6 rows — one for each student. But “New York” appears three times and “London” appears twice. That’s not what you wanted. You wanted a list of unique cities.

Real life example

Imagine you’re organizing a school event and need to send location-specific information to students. You don’t want to list the same city three times in your notes — you just want to know: “We have students in New York, London, and Tokyo.”

Before writing anything down, you’d go through your list once, and each time you see a new city, you write it down. If you’ve already written “New York”, you skip it when you see it again. That’s what DISTINCT does — it gives you each unique value only once.

Using DISTINCT

To remove duplicates, add DISTINCT right after SELECT:

SELECT DISTINCT city FROM students;

Result:

city
New York
London
Tokyo

Now you get 3 rows — one for each unique city. Much cleaner.

DISTINCT with multiple columns

You can also use DISTINCT with multiple columns. In that case, it returns rows where the combination of those columns is unique.

Let’s add a grade column:

idnamecitygrade
1AliceNew YorkA
2BobLondonB
3CharlieNew YorkA
4DianaTokyoA
5EveLondonB
6FrankNew YorkC

If you do:

SELECT DISTINCT city, grade FROM students;

Result:

citygrade
New YorkA
LondonB
TokyoA
New YorkC

This gives you every unique combination of city and grade. Notice “New York” appears twice here because:

  • “New York + A” is one unique combination
  • “New York + C” is a different unique combination

DISTINCT looks at all the columns you select together. If any column is different, the row is considered unique.

DISTINCT with COUNT

A common use case is counting how many unique values there are:

SELECT COUNT(DISTINCT city) FROM students;

Result:

COUNT(DISTINCT city)
3

This tells you there are 3 unique cities in the table.

Compare that to:

SELECT COUNT(city) FROM students;

Result:

COUNT(city)
6

Without DISTINCT, COUNT(city) counts all rows, including duplicates. With DISTINCT, it counts only unique values.

When to use DISTINCT

  • When you want a list of unique values (cities, categories, tags, etc.)
  • When you need to know how many different values exist
  • When duplicates in your results don’t make sense for what you’re trying to see

When NOT to use DISTINCT

Don’t use DISTINCT as a lazy fix for bad queries. If you’re getting duplicates because of a poorly written JOIN or incorrect query logic, fix the query instead of slapping DISTINCT on it.

For example, if you’re joining two tables incorrectly and getting duplicate rows, DISTINCT might hide the problem but not actually fix the root cause. Always understand why you’re getting duplicates before deciding to remove them.

A note about performance

DISTINCT can be slow on large tables because the database has to compare every row to find duplicates. If you’re working with millions of rows, be mindful of this. But for most beginner use cases, it’s perfectly fine.

Share: