Robot Files: A Brief Introduction
Robot files are probably one of the least discussed areas of website maintenance. As you probably already know, search engines use 'bots to crawl through website in order to index your pages in their engines. If left to their own devices, these 'bots will crawl through every inch of your site over and over again. That's not necessarily a good idea.
That's where robot files come in. You create robot files in order to tell those 'bots where to search and what not to search.
First, you should create a new file and label it as robots.txt. You want to then move that file to the root directory of your server. That location is important because it's where the 'bots are going to look for them.
Second, you'll need to include instructions to the 'bots in that file. For an example of what some of those instructions might look like, you can visit http://www.sitepronews.com/archives/2005/
aug/22prt.html.
But, more importantly, let's talk a little bit about how these robot files are going to be useful. For example, let's say you have a portion of your site which can only be accessed by members or through a special password. You wouldn't necessarily want those pages to be indexed by search engines, right?
Using your robot file, you can tell the 'bots not to crawl through those pages.
Keep in mind, however, that not all search engines treat your robot files the same. Some actually require you to have a robot file before they will actually crawl your site. Other 'bots will completely ignore your robot files and will do what they want anyway.
Another way to use robot files is to prevent some search engines from crawling your site. If you simply don't want to be listed with them for whatever reason, you can enter that command. Of course, you're more likely to allow all 'bots to access your site and you can create a command for this as well.
|