SQL Code

Remove duplicate entries / rows a mySQL database table

This is an extremely quick and painless way to remove duplicate rows (tuples) from a MySQL database table. The best part of it is that it requires no programming or PHP coding whatsoever, it can all be done with three manual SQL queries! Note: this only works on MySQL 3.23 and above. But I have a hard time imagining anyone running an older version than that anyways!

So, there are 3 steps, and therefore 3 SQL statements:

Step 1: Move the non duplicates (unique tuples) into a temporary table


CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY [column to remove duplicates by];

Step 2: delete delete the old table
We no longer need the table with all the duplicate entries, so drop it!


DROP TABLE old_table;

Step 3: rename the new_table to the name of the old_table


RENAME TABLE new_table TO old_table;

And of course, don't forget to fix your buggy code to stop inserting duplicates!

Share

81 Comments

  1. just use "insert IGNORE into …." to prevent duplicate entries, or add an "ON DUPLICATE KEY UPDATE …" at the end of your mysql query (http://dev.mysql.com/doc/refman/5.0/en/insert.html).

  2. thank you dude , this helps me a lot, there was a problem with autoindex and my site was 12 h with 1 record, but this is the best solution I found thanks again

  3. Hey Thanks for sharing man..
    Was looking for a solution like this .and it was so quick workaround I hadn't expected this…

  4. I want to add my thanks also. I too have been trying to figure out how to do this for a while and was embarrassed by how simple it is. Cheers!

  5. Worked like a charm and cleaned my db in a matter of minutes.

    Thanks for posting it.

    Tom

  6. Hi,
    Is it possible to find duplicate enteries in one query only?

    Let me know if any one knows this.

  7. Artem Russakovskii

    Be careful with this approach, as dropping the new table will not have any of the keys or settings the old table had, so don't be too quick dropping it.

    Also, what's the point of 'WHERE 1' when you can just omit it?

    And instead of just fixing buggy code, put a UNIQUE index on the column that gets duplicated.

  8. Hey, thanks a lot for your tip.

    I use then with Mysql 4.1 and i don't need use WHERE clause…

    []'s

  9. This is potentially problematic for many, many reasons. This includes indexes, table properties, and no control of which dupes you delete/keep.

    I would first use UNIQUE as an index as suggested above. I would also use this query to see how many duplicates I have too (before trying to fix it).

    >> SELECT id, count(*) AS n FROM table GROUP BY id HAVING n>1

    I would suggest using that to selectivly delete records manually.

  10. i have a table with 4 columns having duplicate entries.i want to remove that entries when rows of four columns should exactly match bcoz if any one differs it not duplicate,can anyone send query for this?

  11. Using a correlated subquery you can select the duplicated rows and then manually delete the duplicates.

    Example code follows. In this instance the 'real' primary key should be ClientID, Engine, ScheduledKeywordJobsID and they are the group by in the subquery.

    select *from ClientVisibility where ScheduledKeywordJobsID in ( SELECT `ScheduledKeywordJobsID` FROM `ClientVisibility` WHERE 1 group by ClientID , Engine , `ScheduledKeywordJobsID` having count(*)>1 ) ORDER BY `ClientVisibility`.`ScheduledKeywordJobsID`, Engine,ClientID ASC

  12. Nice one mate.

    I searched in google, and the first entry was yours. and it worked like a charm. Top spot well earned.

  13. beautiful 3 line SQL code that saved me lots of time . THANK YOU

  14. Thank you dude , thankx a lot i was looking for same kind of simple and unique way to solve my problem.

  15. this could also help

    before :

    id | name
    ————-
    1 | test |
    2 | test |
    3 | php |
    4 | echo |

    after:

    id | name|
    ————
    2 | test |
    3 | php |
    4 | echo |

  16. Thanks a lot dude… It helps me to remove my burden.. Thanks again,
    S Manikandan Siva
    Mumbai-21.

  17. thanks for the info. after searching and trying, i'm adding my 2 cents.

    another q&d way to remove dup recs… here goes:

    ALTER IGNORE TABLE dups ADD UNIQUE INDEX dupidx (dup_fld1 [, dup_fldx]);
    ALTER TABLE dups DROP INDEX dupidx

    advantages:
    – simple script
    – maintains uniqueid
    – maintains table config (ie indexes)

    disadvantages:
    – arbitrary deletion of dups

    hope this helps someone

  18. How can I run this query using cron job? is that possible?

  19. Whoila Blog » Blog Archive » Remove Duplicate Entries in mySQL

    […] is such a great solution by Justin Cook on remove duplicate mySQL entries. Simple and […]

  20. This solution is not good. I totally removes duplicate records. We lost who knows how many subscribers because if they clicked the unprotected submit button more then once, they got totally removed upon cleaning.
    Out fault of course (laziness, copy paste) 🙂

  21. paulw,
    Tried your solution and it worked brilliantly!!!!

    For anyone else this is was my issue.
    I have a cities of the world db with 300k or so records.
    A lot of city rows (15K or so) were duplicates in everything but cityID. I needed to delete cities which had the same cityName, countryID, and regionID but different cityID.

    I used the paul's code as follows:

    ALTER IGNORE TABLE cities ADD UNIQUE INDEX dupidx (cityName, countryID, regionID);
    ALTER TABLE cities DROP INDEX dupidx

    and voila!!!!!! For each group of dupes I was left with only 1 unique row.

    Like paul said, keep in mind that deletion of rows is arbitrary. I'm still in development so it wasn't crucial to keep any specific cityID rows as long as I end up with 1 unique row per city.

  22. Thank you carpool guy. Your solution is a great help for me.

  23. This works quickly and accurately:

    ALTER IGNORE TABLE myTable ADD UNIQUE INDEX(colA,colB);

    http://free.netartmedia.net/Databases/Databases8.html

  24. RENAME TABLE new_table TO old_table;
    should look like this:
    RENAME TABLE old_table TO new_table;
    :)~

  25. Hey JAB, your comment is incorrect. The query should definitely be

    RENAME TABLE new_table TO old_table;

    the point the author is making is, new_table is a TEMP table until populated, then renamed to whatever the old table name was.

    However, The DROP should take place AFTER the NEW TEMP TABLE has been created.

  26. Superb, just what I needed, thank you.

  27. thanks.three steps are clear,i think it is the easiest way to remvoe duplicate rows.

  28. great!it helps me..the code is easy to understand but in the fourth line which is RENAME new_table to old_table, at first i type it to mysql query,but afterwards i realized that it is manually be renamed..but anyway,thanks a lot.!:)

  29. SIDE NOTE: This procedure removes things like primary key and auto increment.

  30. why not just do that:

    delete from mytable
    where id not in
    (select MIN(id) from mytable group by name)

    works great!!!

  31. privendo.. that was a cool query!!!

    Justin.. that was really simple.. but it's cool.. 🙂 but 've got 1 doubt.. will that create statement deal with with the foreign key constraint issue?

  32. Correction:
    @ privendo… w.r.t. mysql
    We can't refer the same mytable as a target table in the inner query.. Hence i suggest creating a temporary table for mytable and refer that in the inner query.. and after the process we can drop the temporary table..
    Is there any other better way than this? Otherwise ur query is really a charm…

  33. Thanks a ton! This helped me eliminate like 4000 rows in a table!!! GREAT code!

  34. thanks for the query!

  35. Well thanks a lot … just lost 13000 entries from a db :/ Result was a table with only 3 entries :)) So people out there don't make my mistake: BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP BACKUP !

  36. "CREATE TABLE new_table as SELECT * FROM old_table" will only copy the data itself and the basic structure, but the new table will be without any meta data like auto-increment id, indexes, keys etc.
    this will make any queries on the new table much slower since it always requires a full table scan …

  37. hey, it has worked with great charm

  38. Hey – why not keep your table intact and simply remove the offending records?

    delete t2 from OFFENDING_TABLE as t1, OFFENDING_TABLE as t2 where t1.UNIQUE_VALUE = t2.UNIQUE_VALUE and t2.id > t1.id;

  39. unfortunately, paulw's solution doesn't work if your field is a text type. I got error 1170.

    Aki

    paulw said,

    on October 3rd, 2008 at 10:35 pm

    thanks for the info. after searching and trying, i'm adding my 2 cents.

    another q&d way to remove dup recs… here goes:

    ALTER IGNORE TABLE dups ADD UNIQUE INDEX dupidx (dup_fld1 [, dup_fldx]);
    ALTER TABLE dups DROP INDEX dupidx

    advantages:
    – simple script
    – maintains uniqueid
    – maintains table config (ie indexes)

    disadvantages:
    – arbitrary deletion of dups

    hope this helps someone

  40. Fantastic, did the job in seconds. Thought I would have to write some PHP to do it. Can I also offer that if you have an auto_increment key field the simplest way to reset after removing dupes is to delete the col completely and then add col again with same attributes and it will populate automatically to new record total.

  41. Thanks a lot! This helped me sort 10 rows in a table GREAT!

  42. What if you have imported a csv file twice into a table which has no unique key? Try this in such case

    create table goodone select distinct * from errortable;

    Thanks
    Gopal

  43. I didn't want to remove records but flag duplicates as inactive using one query.

    update
    contacts c,
    (SELECT max(`contact_id`) id,`email`,count(*) qty FROM `contacts` where active=1 group by email) as tt
    set active = 0
    where
    tt.qty > 1
    and c.contact_id=tt.id

  44. DO NOT USE THIS SOLUTION TO DE_DUP!!!!!

    Guys,
    This is NOT a solution for de-duplicating data. Please, please, please take the following steps:
    1) Backup your database! Absolutely IMPERATIVE!!!!!
    2) Use a query along the lines of "SELECT id, count(*) AS n FROM table GROUP BY id HAVING n>1" – this way you can INSERT the dupes into another table and see how many COUNTs of duplicates you have. There may be a reason in your code somewhere as to why you're getting the dupes.
    3) Dependent on how many we are talking about… And by the way, I have been a DBA for 10+ years, having worked on databases with Terabytes of information stored in tables with millions of records – so have a bit of a clue as to what I'm on about. One would always take a systematic or manual approach to record deletion. You look at the data you are going to delete prior to deletion. YOU NEVER GO DELETING DATA WILLIE NILLIE!!! BAD, BAD, BAD!!!!
    4) Even after you have de-duped, keep a copy of the deleted records for a decent length of time. I'm talking 7 years for an e-Commerce site. Storage is cheap and it's no hassle to keep it zipped up somewhere.

    There is no hard and fast de-dupe strategy. You have to look at the data and decide what is best for you in the situation you are in. This is why people get paid to be DBA's!! Unfortunately, as I'm sure IT Pro's all over will sympathise with, people with a little knowledge get themselves into a lot of trouble. The tools to do some wonderful things with these fantastic inventions called computers are freely available and very easy to use these days. Please, though, before you get yourself into a heap of trouble… Don't leave your brain at the door, take it in with you and use it before doing stuff you may regret.
    Sorry for the rant! Just trying to keep folk out of the frying pan!!

  45. This will not work for multiple column uniqueness. You can do it like this –

    1. create table foobar as select * from thetable;
    2. delete from foobar;
    3. alter table foobar add primary key (col1,col2,col3…);
    4. insert ignore into foobar select * from thetable;
    5. rename table thetable to thetable_bak;
    6. rename table foobar to thetable;

    In the 1st step, you create the scratch table automatically
    with the same structure as the main table. You then delete
    all its records so that the primary key can be added. Rest
    is straightforward. Insert IGNORE will drop all duplicate
    entries keeping only the first. If you want to keep the last,
    use insert replace.

  46. It's like a charm!!! Stupendous! Thank you very much! 🙂

  47. I do not suggest using this method for anyone that might want to "undo" the application of the "unique" constraint on their database tables. Though the steps worked like a charm l had a heck of a time figuring out how to remove the unique constraint on my table row and it broke my site for some time. Perhaps you should add how to remove the unique constraint after you're done, as well as how to find the constraint_name (which was the part that was tripping me up).

  48. Awesome, thats what I was struggler with since 2 hrs and was surprised to see the best available solution here :))))))))))))

Leave a Reply

Your email address will not be published. Required fields are marked *

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Anti-spam image