Remove duplicate entries / rows a mySQL database table
This is an extremely quick and painless way to remove duplicate rows (tuples) from a MySQL database table. The best part of it is that it requires no programming or PHP coding whatsoever, it can all be done with three manual SQL queries! Note: this only works on MySQL 3.23 and above. But I have a hard time imagining anyone running an older version than that anyways!
So, there are 3 steps, and therefore 3 SQL statements:
Step 1: Move the non duplicates (unique tuples) into a temporary table
SELECT * FROM old_table WHERE 1 GROUP BY [COLUMN TO remove duplicates BY];
Step 2: delete delete the old table
We no longer need the table with all the duplicate entries, so drop it!
Step 3: rename the new_table to the name of the old_table
And of course, don't forget to fix your buggy code to stop inserting duplicates!


on June 5th, 2007 at 8:56 pm
just use "insert IGNORE into …." to prevent duplicate entries, or add an "ON DUPLICATE KEY UPDATE …" at the end of your mysql query (http://dev.mysql.com/doc/refman/5.0/en/insert.html).
on October 6th, 2007 at 4:04 am
thank you dude , this helps me a lot, there was a problem with autoindex and my site was 12 h with 1 record, but this is the best solution I found thanks again
on October 10th, 2007 at 1:21 pm
Hey Thanks for sharing man..
Was looking for a solution like this .and it was so quick workaround I hadn't expected this…
on October 12th, 2007 at 10:32 am
I want to add my thanks also. I too have been trying to figure out how to do this for a while and was embarrassed by how simple it is. Cheers!
on November 5th, 2007 at 12:32 pm
Worked like a charm and cleaned my db in a matter of minutes.
Thanks for posting it.
Tom
on December 18th, 2007 at 2:01 am
Hi,
Is it possible to find duplicate enteries in one query only?
Let me know if any one knows this.
on December 26th, 2007 at 1:09 pm
Be careful with this approach, as dropping the new table will not have any of the keys or settings the old table had, so don't be too quick dropping it.
Also, what's the point of 'WHERE 1' when you can just omit it?
And instead of just fixing buggy code, put a UNIQUE index on the column that gets duplicated.
on January 26th, 2008 at 11:41 am
Hey, thanks a lot for your tip.
I use then with Mysql 4.1 and i don't need use WHERE clause…
[]'s
on January 30th, 2008 at 1:53 am
This is potentially problematic for many, many reasons. This includes indexes, table properties, and no control of which dupes you delete/keep.
I would first use UNIQUE as an index as suggested above. I would also use this query to see how many duplicates I have too (before trying to fix it).
>> SELECT id, count(*) AS n FROM table GROUP BY id HAVING n>1
I would suggest using that to selectivly delete records manually.
on February 21st, 2008 at 11:05 am
i have a table with 4 columns having duplicate entries.i want to remove that entries when rows of four columns should exactly match bcoz if any one differs it not duplicate,can anyone send query for this?
on May 6th, 2008 at 3:30 pm
Using a correlated subquery you can select the duplicated rows and then manually delete the duplicates.
Example code follows. In this instance the 'real' primary key should be ClientID, Engine, ScheduledKeywordJobsID and they are the group by in the subquery.
select *from ClientVisibility where ScheduledKeywordJobsID in ( SELECT `ScheduledKeywordJobsID` FROM `ClientVisibility` WHERE 1 group by ClientID , Engine , `ScheduledKeywordJobsID` having count(*)>1 ) ORDER BY `ClientVisibility`.`ScheduledKeywordJobsID`, Engine,ClientID ASC
on June 17th, 2008 at 3:55 pm
Nice one mate.
I searched in google, and the first entry was yours. and it worked like a charm. Top spot well earned.
on July 16th, 2008 at 6:05 pm
beautiful 3 line SQL code that saved me lots of time . THANK YOU
on August 14th, 2008 at 11:25 am
Thank you dude , thankx a lot i was looking for same kind of simple and unique way to solve my problem.
on September 16th, 2008 at 1:21 pm
this could also help
before :
id | name
————-
1 | test |
2 | test |
3 | php |
4 | echo |
after:
id | name|
————
2 | test |
3 | php |
4 | echo |
on September 24th, 2008 at 6:30 am
Thanks a lot dude… It helps me to remove my burden.. Thanks again,
S Manikandan Siva
Mumbai-21.
on October 3rd, 2008 at 10:35 pm
thanks for the info. after searching and trying, i'm adding my 2 cents.
another q&d way to remove dup recs… here goes:
ALTER IGNORE TABLE dups ADD UNIQUE INDEX dupidx (dup_fld1 [, dup_fldx]);
ALTER TABLE dups DROP INDEX dupidx
advantages:
- simple script
- maintains uniqueid
- maintains table config (ie indexes)
disadvantages:
- arbitrary deletion of dups
hope this helps someone
on October 15th, 2008 at 12:41 pm
How can I run this query using cron job? is that possible?
on October 28th, 2008 at 4:05 pm
[...] is such a great solution by Justin Cook on remove duplicate mySQL entries. Simple and [...]
on October 31st, 2008 at 4:03 am
This solution is not good. I totally removes duplicate records. We lost who knows how many subscribers because if they clicked the unprotected submit button more then once, they got totally removed upon cleaning.
Out fault of course (laziness, copy paste)
on January 9th, 2009 at 12:28 pm
paulw,
Tried your solution and it worked brilliantly!!!!
For anyone else this is was my issue.
I have a cities of the world db with 300k or so records.
A lot of city rows (15K or so) were duplicates in everything but cityID. I needed to delete cities which had the same cityName, countryID, and regionID but different cityID.
I used the paul's code as follows:
ALTER TABLE cities DROP INDEX dupidx
and voila!!!!!! For each group of dupes I was left with only 1 unique row.
Like paul said, keep in mind that deletion of rows is arbitrary. I'm still in development so it wasn't crucial to keep any specific cityID rows as long as I end up with 1 unique row per city.
on January 20th, 2009 at 12:08 am
Thank you carpool guy. Your solution is a great help for me.
on March 11th, 2009 at 4:28 pm
[...] http://www.justin-cook.com/wp/2006/12/12/remove-duplicate-entries-rows-a-mysql-database-table/ [...]
on April 13th, 2009 at 5:19 am
This works quickly and accurately:
ALTER IGNORE TABLE myTable ADD UNIQUE INDEX(colA,colB);
http://free.netartmedia.net/Databases/Databases8.html
on April 27th, 2009 at 12:00 pm
RENAME TABLE new_table TO old_table;
~
should look like this:
RENAME TABLE old_table TO new_table;
on May 20th, 2009 at 12:20 pm
Hey JAB, your comment is incorrect. The query should definitely be
RENAME TABLE new_table TO old_table;
the point the author is making is, new_table is a TEMP table until populated, then renamed to whatever the old table name was.
However, The DROP should take place AFTER the NEW TEMP TABLE has been created.
on May 20th, 2009 at 10:36 pm
Superb, just what I needed, thank you.
on June 17th, 2009 at 3:51 am
thanks.three steps are clear,i think it is the easiest way to remvoe duplicate rows.
on June 19th, 2009 at 8:53 pm
great!it helps me..the code is easy to understand but in the fourth line which is RENAME new_table to old_table, at first i type it to mysql query,but afterwards i realized that it is manually be renamed..but anyway,thanks a lot.!:)
on June 20th, 2009 at 6:37 pm
SIDE NOTE: This procedure removes things like primary key and auto increment.