Robots.txt – How to tutorial

SEO-robots.txt

Robots.txt is a way to tell search engine bots what to crawl and what not to.  This is just a text file and path to upload it is on your website web root folder so that bots can access it like http://www.yourwebsite.com/robots.txt. So when you add a robots.txt file, make sure it is accessible this way.

Why we need robots.txt file?

a) To block certain area of the websites for bots so that it wont be crawled.
b) To allow certain bots and disallow bad or unwanted bots.
c) Even use it to point sitemap url for bots.
d) Managing the crawl rate for bots to reduce server overload.

Creating robots.txt file

There are very few parameters or instruction for robots.txt file. They are User-agent, Disallow and Allow.

User-agent : is to tell bots which bots we are writing instruction for. If it is for all then we can use wild card * .

Allow / Disallow: this can be file, path to the folder, combinations of wild card e.t.c to allow and  disallow bots.

How to use robots.txt?

here are some examples:

a) To allow all the bots to the site.
just make the robots.txt blank page or

User-agent:*
Allow: /

b) To disallow all the bots:
User-agent:*
Disallow: /

c) To disallow specific bots.
User-agent: badBot
Disallow: /

d) To disallow specific folder
User-agent:*
Disallow: /foldername/

e) To allow certain file in disallowed folder
User-agent: *
Disallow: / foldername/
Allow: /foldername/myfile.jpg

f) Adding sitemap through robots.txt
Sitemap: http://mysite.com/sitemap.xml

g) Delaying the crawl rate
crawl-rate: 5

Each set of instruction should be separated by line gap. Here is the better instruction.

User-agent: *
Disallow: /thirdAuth/
Disallow: /sort
Disallow: /suggest
Disallow: /share
Disallow: /signup
Disallow: /img/
Disallow:/private/
Allow: /private/5384-*
Allow: /private/5385-*

Sitemap: http://mysite.au/sitemaps/sitemap.xml
user-agent: AhrefsBot
disallow: /

User-agent: TurnitinBot
Disallow: /

User-agent: SEOkicks-Robot
Disallow: /
User-agent: msnbot-media
Crawl-delay: 10

User-agent: FlipboardProxy
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: UnwindFetchor
Disallow: /

User-agent: MetaURI
Disallow: /

User-agent: BLEXBot
Disallow: /

User-agent: msnbot-media
Crawl-delay: 5

For details you can visit http://www.robotstxt.org/wc/robots.html .

2 thoughts on “Robots.txt – How to tutorial

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s