December 12th, 2006

Remove duplicate entries / rows a mySQL database table

This is an extremely quick and painless way to remove duplicate rows (tuples) from a MySQL database table. The best part of it is that it requires no programming or PHP coding whatsoever, it can all be done with three manual SQL queries! Note: this only works on MySQL 3.23 and above. But I have a hard time imagining anyone running an older version than that anyways!

So, there are 3 steps, and therefore 3 SQL statements:

Step 1: Move the non duplicates (unique tuples) into a temporary table


CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY [column to remove duplicates by];

Step 2: delete delete the old table
We no longer need the table with all the duplicate entries, so drop it!


DROP TABLE old_table;

Step 3: rename the new_table to the name of the old_table


RENAME TABLE new_table TO old_table;

And of course, don't forget to fix your buggy code to stop inserting duplicates!

  • Share/Bookmark

41 Responses to ' Remove duplicate entries / rows a mySQL database table '

Subscribe to comments with RSS or TrackBack to ' Remove duplicate entries / rows a mySQL database table '.

  1. Carpool Guy said,

    on June 5th, 2007 at 8:56 pm

    just use "insert IGNORE into …." to prevent duplicate entries, or add an "ON DUPLICATE KEY UPDATE …" at the end of your mysql query (http://dev.mysql.com/doc/refman/5.0/en/insert.html).

  2. rudix said,

    on October 6th, 2007 at 4:04 am

    thank you dude , this helps me a lot, there was a problem with autoindex and my site was 12 h with 1 record, but this is the best solution I found thanks again

  3. Bijay Rungta said,

    on October 10th, 2007 at 1:21 pm

    Hey Thanks for sharing man..
    Was looking for a solution like this .and it was so quick workaround I hadn't expected this…

  4. Louis said,

    on October 12th, 2007 at 10:32 am

    I want to add my thanks also. I too have been trying to figure out how to do this for a while and was embarrassed by how simple it is. Cheers!

  5. tom said,

    on November 5th, 2007 at 12:32 pm

    Worked like a charm and cleaned my db in a matter of minutes.

    Thanks for posting it.

    Tom

  6. amit sankhala said,

    on December 18th, 2007 at 2:01 am

    Hi,
    Is it possible to find duplicate enteries in one query only?

    Let me know if any one knows this.


  7. on December 26th, 2007 at 1:09 pm

    Be careful with this approach, as dropping the new table will not have any of the keys or settings the old table had, so don't be too quick dropping it.

    Also, what's the point of 'WHERE 1' when you can just omit it?

    And instead of just fixing buggy code, put a UNIQUE index on the column that gets duplicated.

  8. Lucas Vasconcelos said,

    on January 26th, 2008 at 11:41 am

    Hey, thanks a lot for your tip.

    I use then with Mysql 4.1 and i don't need use WHERE clause…

    []'s

  9. majorxp said,

    on January 30th, 2008 at 1:53 am

    This is potentially problematic for many, many reasons. This includes indexes, table properties, and no control of which dupes you delete/keep.

    I would first use UNIQUE as an index as suggested above. I would also use this query to see how many duplicates I have too (before trying to fix it).

    >> SELECT id, count(*) AS n FROM table GROUP BY id HAVING n>1

    I would suggest using that to selectivly delete records manually.

  10. vijaykumar said,

    on February 21st, 2008 at 11:05 am

    i have a table with 4 columns having duplicate entries.i want to remove that entries when rows of four columns should exactly match bcoz if any one differs it not duplicate,can anyone send query for this?

  11. Dylan said,

    on May 6th, 2008 at 3:30 pm

    Using a correlated subquery you can select the duplicated rows and then manually delete the duplicates.

    Example code follows. In this instance the 'real' primary key should be ClientID, Engine, ScheduledKeywordJobsID and they are the group by in the subquery.

    select *from ClientVisibility where ScheduledKeywordJobsID in ( SELECT `ScheduledKeywordJobsID` FROM `ClientVisibility` WHERE 1 group by ClientID , Engine , `ScheduledKeywordJobsID` having count(*)>1 ) ORDER BY `ClientVisibility`.`ScheduledKeywordJobsID`, Engine,ClientID ASC

  12. Lone said,

    on June 17th, 2008 at 3:55 pm

    Nice one mate.

    I searched in google, and the first entry was yours. and it worked like a charm. Top spot well earned.

  13. palm_homes said,

    on July 16th, 2008 at 6:05 pm

    beautiful 3 line SQL code that saved me lots of time . THANK YOU

  14. kapil said,

    on August 14th, 2008 at 11:25 am

    Thank you dude , thankx a lot i was looking for same kind of simple and unique way to solve my problem.

  15. bardh7 said,

    on September 16th, 2008 at 1:21 pm

    this could also help

    before :

    id | name
    ————-
    1 | test |
    2 | test |
    3 | php |
    4 | echo |

    after:

    id | name|
    ————
    2 | test |
    3 | php |
    4 | echo |

  16. S.Manikandan Siva said,

    on September 24th, 2008 at 6:30 am

    Thanks a lot dude… It helps me to remove my burden.. Thanks again,
    S Manikandan Siva
    Mumbai-21.

  17. paulw said,

    on October 3rd, 2008 at 10:35 pm

    thanks for the info. after searching and trying, i'm adding my 2 cents.

    another q&d way to remove dup recs… here goes:

    ALTER IGNORE TABLE dups ADD UNIQUE INDEX dupidx (dup_fld1 [, dup_fldx]);
    ALTER TABLE dups DROP INDEX dupidx

    advantages:
    - simple script
    - maintains uniqueid
    - maintains table config (ie indexes)

    disadvantages:
    - arbitrary deletion of dups

    hope this helps someone

  18. Adam said,

    on October 15th, 2008 at 12:41 pm

    How can I run this query using cron job? is that possible?


  19. on October 28th, 2008 at 4:05 pm

    [...] is such a great solution by Justin Cook on remove duplicate mySQL entries. Simple and [...]

  20. Bogdan said,

    on October 31st, 2008 at 4:03 am

    This solution is not good. I totally removes duplicate records. We lost who knows how many subscribers because if they clicked the unprotected submit button more then once, they got totally removed upon cleaning.
    Out fault of course (laziness, copy paste) :)

  21. Freddy said,

    on January 9th, 2009 at 12:28 pm

    paulw,
    Tried your solution and it worked brilliantly!!!!

    For anyone else this is was my issue.
    I have a cities of the world db with 300k or so records.
    A lot of city rows (15K or so) were duplicates in everything but cityID. I needed to delete cities which had the same cityName, countryID, and regionID but different cityID.

    I used the paul's code as follows:

    ALTER IGNORE TABLE cities ADD UNIQUE INDEX dupidx (cityName, countryID, regionID);
    ALTER TABLE cities DROP INDEX dupidx

    and voila!!!!!! For each group of dupes I was left with only 1 unique row.

    Like paul said, keep in mind that deletion of rows is arbitrary. I'm still in development so it wasn't crucial to keep any specific cityID rows as long as I end up with 1 unique row per city.

  22. pushpendra said,

    on January 20th, 2009 at 12:08 am

    Thank you carpool guy. Your solution is a great help for me.


  23. on March 11th, 2009 at 4:28 pm

    [...] http://www.justin-cook.com/wp/2006/12/12/remove-duplicate-entries-rows-a-mysql-database-table/ [...]

  24. Me said,

    on April 13th, 2009 at 5:19 am

    This works quickly and accurately:

    ALTER IGNORE TABLE myTable ADD UNIQUE INDEX(colA,colB);

    http://free.netartmedia.net/Databases/Databases8.html

  25. JAB said,

    on April 27th, 2009 at 12:00 pm

    RENAME TABLE new_table TO old_table;
    should look like this:
    RENAME TABLE old_table TO new_table;
    :) ~

  26. PM said,

    on May 20th, 2009 at 12:20 pm

    Hey JAB, your comment is incorrect. The query should definitely be

    RENAME TABLE new_table TO old_table;

    the point the author is making is, new_table is a TEMP table until populated, then renamed to whatever the old table name was.

    However, The DROP should take place AFTER the NEW TEMP TABLE has been created.

  27. Tom Ryder said,

    on May 20th, 2009 at 10:36 pm

    Superb, just what I needed, thank you.

  28. barbrine said,

    on June 17th, 2009 at 3:51 am

    thanks.three steps are clear,i think it is the easiest way to remvoe duplicate rows.

  29. miaka_yuuki said,

    on June 19th, 2009 at 8:53 pm

    great!it helps me..the code is easy to understand but in the fourth line which is RENAME new_table to old_table, at first i type it to mysql query,but afterwards i realized that it is manually be renamed..but anyway,thanks a lot.!:)

  30. Nate Cotton said,

    on June 20th, 2009 at 6:37 pm

    SIDE NOTE: This procedure removes things like primary key and auto increment.

  31. privendo said,

    on July 21st, 2009 at 7:29 am

    why not just do that:

    delete from mytable
    where id not in
    (select MIN(id) from mytable group by name)

    works great!!!

  32. maverick said,

    on July 29th, 2009 at 11:11 pm

    privendo.. that was a cool query!!!

    Justin.. that was really simple.. but it's cool.. :-) but 've got 1 doubt.. will that create statement deal with with the foreign key constraint issue?

  33. maverick said,

    on July 29th, 2009 at 11:29 pm

    Correction:
    @ privendo… w.r.t. mysql
    We can't refer the same mytable as a target table in the inner query.. Hence i suggest creating a temporary table for mytable and refer that in the inner query.. and after the process we can drop the temporary table..
    Is there any other better way than this? Otherwise ur query is really a charm…

  34. Geb said,

    on August 13th, 2009 at 1:56 pm

    Thanks a ton! This helped me eliminate like 4000 rows in a table!!! GREAT code!

  35. vipul sharma said,

    on September 23rd, 2009 at 7:13 pm

    thanks for the query!

  36. Alex said,

    on September 24th, 2009 at 7:33 pm

    Well thanks a lot … just lost 13000 entries from a db :/ Result was a table with only 3 entries :) ) So people out there don't make my mistake: BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP !

  37. Daniel said,

    on October 1st, 2009 at 7:21 pm

    "CREATE TABLE new_table as SELECT * FROM old_table" will only copy the data itself and the basic structure, but the new table will be without any meta data like auto-increment id, indexes, keys etc.
    this will make any queries on the new table much slower since it always requires a full table scan …

  38. SURESH MIDDE said,

    on October 14th, 2009 at 12:03 am

    hey, it has worked with great charm

  39. Richard Luck said,

    on November 4th, 2009 at 1:55 pm

    Hey – why not keep your table intact and simply remove the offending records?

    delete t2 from OFFENDING_TABLE as t1, OFFENDING_TABLE as t2 where t1.UNIQUE_VALUE = t2.UNIQUE_VALUE and t2.id > t1.id;

  40. aki said,

    on November 5th, 2009 at 10:26 am

    unfortunately, paulw's solution doesn't work if your field is a text type. I got error 1170.

    Aki

    paulw said,

    on October 3rd, 2008 at 10:35 pm

    thanks for the info. after searching and trying, i'm adding my 2 cents.

    another q&d way to remove dup recs… here goes:

    ALTER IGNORE TABLE dups ADD UNIQUE INDEX dupidx (dup_fld1 [, dup_fldx]);
    ALTER TABLE dups DROP INDEX dupidx

    advantages:
    - simple script
    - maintains uniqueid
    - maintains table config (ie indexes)

    disadvantages:
    - arbitrary deletion of dups

    hope this helps someone

  41. datahound said,

    on November 20th, 2009 at 1:30 pm

    Fantastic, did the job in seconds. Thought I would have to write some PHP to do it. Can I also offer that if you have an auto_increment key field the simplest way to reset after removing dupes is to delete the col completely and then add col again with same attributes and it will populate automatically to new record total.

Leave a reply

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word