httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen Boyle <...@bourse.ch>
Subject Re: robots.txt
Date Fri, 18 Jan 2002 09:28:21 GMT
RuneImp wrote:
> 
> It will block all robots from the site. If you only
> wish to block all robots from certain directories
> then you would do:
> 
> User-agent: *
> Disallow: /badsite1/
> Disallow: /badsite2/
> 
> If you only want certain robots to not see certain
> directories then:
> 
> User-agent: Scooter
> Disallow: /badsite1/
> Disallow: /badsite2/

It should be pointed out that the robot exclusion standard is entirely
voluntary. The robot is supposed to request "robots.txt" first, read it,
then decide whether to crawl the site or not. If the robot wants to
ignore "robots.txt" completely, it is free to do so and you can't stop
it requesting what it likes.

If you found a robot that was always coming from a particular IP
address, you could always "Deny" it, I guess. AFAIK, there is no way to
do something like:

Deny if UserAgent =~ "Scooter"

Pity...

Rgds,

Owen Boyle.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message