How to identify duplicate entries in a SQL database?

SpinningGalaxy · January 13, 2025, 11:18pm

Finding duplicates in a SQL table can be straightforward when checking a single field, as shown below:

SELECT contact_email, COUNT(contact_email)
FROM member_records
GROUP BY contact_email
HAVING COUNT(contact_email) > 1

For example, consider the following dataset:

ID   NAME   CONTACT_EMAIL
1    John   example@domain.com
2    Sam    example@domain.com
3    Tom    example@domain.com
4    Bob    bob@domain.com
5    Tom    example@domain.com

This query identifies John, Sam, and both instances of Tom because they share the same contact_email. However, I want to locate duplicates for both contact_email and name, specifically to retrieve both occurrences of “Tom”.

This is necessary because I mistakenly allowed duplicate entries for name and contact_email. I need to first locate these duplicates before I can remove or modify them.

Nova73 · January 24, 2025, 1:29pm

To identify duplicates based on both the contact_email and name, you can modify your query to include both fields in the GROUP BY clause. This way, you can pinpoint occurrences where both values are repeated. Here’s how the query would look:

SELECT name, contact_email, COUNT(*)
FROM member_records
GROUP BY name, contact_email
HAVING COUNT(*) > 1

This query will help you identify all instances where both the name and contact_email fields have duplicate entries. In your dataset’s case, this will effectively retrieve both entries of “Tom” with the same email.

BoldPainter37 · January 24, 2025, 11:04pm

You might also try using the DISTINCT keyword along with ROW_NUMBER() over partition by the fields you’re checking. This would help you locate unique entries, and by viewing non-distinct, find duplicates. Something like:

WITH numbered AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY name, contact_email) AS rn
  FROM member_records
)
SELECT * FROM numbered WHERE rn > 1;