Be Found:
|
||
![]() |
![]() |
![]() |
![]() |
||
![]() |
![]() |
![]() |
How to design findable web sites and get ranked with the best The basics: understanding search engines and spiders Using robot protocols to prevent unwanted indexing Use core HTML tags more efficiently Getting dynamic web pages indexed The problem of unreadable content The commercial options for SEO |
Using the Robot protocolsDo you have parts of your site that you don't want indexed for some reason - for example, your cgi-bin folder or any equivalent?You can use the robots exclusion protocol to let these programs know where not to go. This is done in a text file, which must be called robots.txt, and placed at the root level of your site directory, where your home index page lives. The first line in this file should begin User-agent:. In most cases, this would be followed with an asterisk, meaning all robots rather than just named ones. The next lines will list each directory or specific page you want left alone. Begin each line with Disallow: then the relative address - for example Disallow: /cgi-bin/. The same effect can be achieved by putting a 'robots' meta tag into the head of a page. Use some paired combination of index, follow, noindex and nofollow in the content data. For example, noindex,nofollow tells spiders not to index the page and not to follow any links they might find. The full tag would look like this: <meta name="robots" content="noindex,nofollow">. You don't need top-level admin access to the server to set this. However, not all spiders pay attention to this meta tag instruction. For more information visit www.robotstxt.org/wc/exclusion.html. Even if you don't want to exclude search engines from parts of your site you should still consider adding a robots.txt file at the top level. Search engines look for one when they arrive at a site, and even if they don't mind not finding one the request will show up as a 'page not found' error in your web server logs.
As long as you have admin access to the top level of your Web site's directory structure you can upload your own robots.txt file to instruct search site spiders where they are and are not allowed to go. This is the preferred process, although you won't be able to do this if your site is within a subdirectory of a domain.
The alternative approach, putting robot instructions into a 'robots' meta tag, is more flexible, although not all robots pay attention to this. Use index or noindex paired with follow or nofollow to tell robots what to do with each page as they get there. |
|||
Looking for tips on searching more effectively instead? Read the Search Secrets pages. |
||||
![]() |
![]() |
![]() |
![]() |
![]() |
Have you found the information on this site useful? If you like, you can make a small donation directly to my hosting bills! It would be deeply appreciated.
|
||||
![]() |
![]() |
![]() |
![]() |
![]() |