509-261 Cooper Street
Ottawa, Ontario (Canada) K2P 0G3

Michele Foster
(613) 232-7447
michele@wizardev.ca
http://WizarDev.ca/

WizarDev Logo

Professional Web Development, Web Design and Database Solutions

Subscribe to Web Words

Powered by FeedBlitz

RSS Icon Subscribe via RSS

Web Words - Blog

Finding Duplicate Records and Displaying Each of Them

SQL and Database Things

Another SQL gem I discovered today. I needed to find records where a first name and last name combination occur more than once, so that the client can verify if they are in fact duplicates, and delete the unnecessary record.

SELECT ContactID, LastName, FirstName
   FROM Contacts
   WHERE (LastName, FirstName) IN
   (SELECT LastName, FirstName
       FROM Contacts
       WHERE Active = 1
       GROUP BY LastName, FirstName
       HAVING COUNT(*) > 1)
   ORDER BY LastName, FirstName

Source Finding and Eliminating Duplicate Data

Along a similar line, but less complicated, I wanted a list of all unique email addresses, along with some additional fields, so SELECT DISTINCT EmailAddress would not suffice. Here’s the solution I’m using:

SELECT EmailAddress, ContactID, FirstName, LastName 
FROM Contacts 
WHERE SubscribeNewsletter = 1 
GROUP BY EmailAddress 
ORDER BY LastName ASC, FirstName ASC 

I was a little unsure about the above code, as my references seemed to indicate that GROUP BY needed to work with an aggregate clause, so I reverse-engineered it (read, ran some opposite queries), and am confident the one above is accurate.

I’m open to critique. :)

UPDATE

It should be noted that the second query above, “works” in as much as you get only one row with one name and email address. You do not get all of the names associated with that particular address. In my case, this is what I wanted, all email addreses just once, along with a name, any name that uses that address is fine.

Additionally, this query worked just fine in MySQL, but didn’t work in PostGresSQL. Thanks to my great friend Rudy, he pointed me in the right direction with the following.

SELECT EmailAddress
       , max(ContactID) as max_ContactID
       , max(FirstName) as max_FirstName
       , max(LastName)  as max_LastName
  FROM Contacts
  WHERE SubscribeNewsletter = 1
  GROUP BY EmailAddress
  ORDER BY max_LastName ASC, max_FirstName ASC

Some resources for further reading …

June 21st, 2007

Comments are closed.