Remove spam / duplicate MySQL records from MySQL tables.

Remove MySQL duplicate and spam content from MySQL table:


1) Remove duplicate content:

Create a tmp table and create a unique index to the column. then copy the unique content form your table to tmp table.

create table tmp like `table`;
ALTER TABLE `tmp` ADD UNIQUE INDEX(text1, text2, text3, text4, text5, text6);
insert IGNORE into `tmp` select * from `table`;

delete duplicate content from your main table:

Delete from
where id not in (select id from tmp);

We can also do this with the main table it self, but if you data is too large it will take some time so i prefer above
way :

ADD UNIQUE INDEX (mycolumn1, mycolumn2);

Once the process is complete remove unique index from the column.

2) Remove spam content:

i) Identify the spam first, for me spam contents are those who have certain keywords in their text. These can be links, foul words,
e.t.c so we can remove those lines by.

Delete from
where `mycolumn` REGEXP 'foul word|http://|buy|purchase'

ii) Remove non english charectors like chinese, russians e.t.c

Delete from
where `mycolumn` != CONVERT(myclumn USING ASCII)


iii)Remove very records which doesnt meet the text length standards, for examples comments column with just 3 char is not a valid comments.

Delete from `table` where LENGTH(mycloumn)<5;

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s