Remove spam / duplicate MySQL records from MySQL tables.

Remove MySQL duplicate and spam content from MySQL table:

 

1) Remove duplicate content:

Create a tmp table and create a unique index to the column. then copy the unique content form your table to tmp table.

create table tmp like `table`;
ALTER TABLE `tmp` ADD UNIQUE INDEX(text1, text2, text3, text4, text5, text6);
insert IGNORE into `tmp` select * from `table`;

delete duplicate content from your main table:

Delete from
`table`
where id not in (select id from tmp);

We can also do this with the main table it self, but if you data is too large it will take some time so i prefer above
way :

ALTER IGNORE TABLE
`table`
ADD UNIQUE INDEX (mycolumn1, mycolumn2);

Once the process is complete remove unique index from the column.

2) Remove spam content:

i) Identify the spam first, for me spam contents are those who have certain keywords in their text. These can be links, foul words,
e.t.c so we can remove those lines by.

Delete from
`table`
where `mycolumn` REGEXP 'foul word|http://|buy|purchase'

ii) Remove non english charectors like chinese, russians e.t.c

Delete from
`table`
where `mycolumn` != CONVERT(myclumn USING ASCII)

 

iii)Remove very records which doesnt meet the text length standards, for examples comments column with just 3 char is not a valid comments.

Delete from `table` where LENGTH(mycloumn)<5;
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s