Finding Duplicate Records and Displaying Each of Them
Another SQL gem I discovered today. I needed to find records where a first name and last name combination occur more than once, so that the client can verify if they are in fact duplicates, and delete the unnecessary record.
SELECT ContactID, LastName, FirstName
FROM Contacts
WHERE (LastName, FirstName) IN
(SELECT LastName, FirstName
FROM Contacts
WHERE Active = 1
GROUP BY LastName, FirstName
HAVING COUNT(*) > 1)
ORDER BY LastName, FirstName
Source Finding and Eliminating Duplicate Data
Along a similar line, but less complicated, I wanted a list of all unique email addresses, along with some additional fields, so SELECT DISTINCT EmailAddress would not suffice. Here’s the solution I’m using:
SELECT EmailAddress, ContactID, FirstName, LastName FROM Contacts WHERE SubscribeNewsletter = 1 GROUP BY EmailAddress ORDER BY LastName ASC, FirstName ASC
I was a little unsure about the above code, as my references seemed to indicate that GROUP BY needed to work with an aggregate clause, so I reverse-engineered it (read, ran some opposite queries), and am confident the one above is accurate.
I’m open to critique.
UPDATE
It should be noted that the second query above, “works” in as much as you get only one row with one name and email address. You do not get all of the names associated with that particular address. In my case, this is what I wanted, all email addreses just once, along with a name, any name that uses that address is fine.
Additionally, this query worked just fine in MySQL, but didn’t work in PostGresSQL. Thanks to my great friend Rudy, he pointed me in the right direction with the following.
SELECT EmailAddress
, max(ContactID) as max_ContactID
, max(FirstName) as max_FirstName
, max(LastName) as max_LastName
FROM Contacts
WHERE SubscribeNewsletter = 1
GROUP BY EmailAddress
ORDER BY max_LastName ASC, max_FirstName ASC
Some resources for further reading …
