->
The unimaginably growing complexity of web design and its related activities have necessitated the proper usage of two files namely robots.txt and .htaccess.
To start from the beginning let us say that both the files can be created using any text editor such as notepad and both of them must be placed in the root folder of your web server which implies that whenever your are opening the file system of your host, these two files must be visible to you. Please note that some web servers prefer to keep the .htacces file hidden. BUT search engine crawlers and spiders can easily trace them.
Let us now try to define the individual purpose. Robots.txt basically controls the movement of the robots, sent by the search engines, inside the existing file system of your web server. So, if you want some prohibitions, you can mention it in robots.txt. You can even customize the robots.txt according to your needs. On the other hand .htaccess file primarily serves the purpose of URL redirection and module rewrite.
Let us take an example: while revamping your website, you want to omit some of your old pages, which had been already crawled by the search engines. Now, such sudden omissions of web pages, which have been already crawled, casts a tremendous negative impact and they treat it as unfounded tagging the infamous URL Not found. The best way is to redirect the old pages intelligently to the existing pages, so that crawlers don’t fall in a soup, instead they find a landing platform, wherein they also redirect the visitors.
Module Rewrite has gained importance when people have started noticing that search engines prefer user friendly URLs instead of automated URLs generated by several programs. This is worth mentioning that even popular software packages like PHPLd and WordPress also have kept provision for module rewritten URLs to earn you the additional advantage of generating Search Engine friendly URLs.
It is to be mentioned that the presence of either robots.txt or .htaccess is mandatory in your web server, but it gives you a complete control over the web server and moreover you can dictate the crawlers and spiders in your own way.
Subscribe to our blog to receive new posts and updates by EmailIf you enjoyed this post, make sure you subscribe to my RSS feed!
I’ve only recently started paying attention to robots.txt and .htaccess. very hard to find good info on how to create a good htaccess file for your site. i think one reason is cos cms’s differ so vastly.
one specific problem, and i’m not too sure about this, is the canonicalisation issue. thing is, it’s easy to sort the front page.
example:
getting a single url, http://yoursite.com, or http://www.yoursite.com, is easy.
however, how do you set up htaccess to re-direct other pages on the site?
so, if you want to re-direct from the non-www url to the www url, doing the home page is easy.
home page:
http://www.yoursite.com
other page:
yoursite.com/other-page.whatever
see the problem?