httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roy T. Fielding" <field...@liege.ICS.UCI.EDU>
Subject Re: robot denial
Date Fri, 19 Jul 1996 09:44:19 GMT
> Well the robot folks seem to have reluctanly agreed to add "/robot"
> to USER-AGENT so that servers can react to it.

Not a good idea -- it won't last more than three months before idiots
start changing content based on that, at which time the typical index
robots will just have to mimic Lynx or Netscape.  In any case, "/robot"
is not valid HTTP syntax.

> It's not clear if any will do it, but that's what they'll send if they
> decide to declare themselves in an easier to detect manner.

What good is it to detect that something is a robot?  What you want is to
detect their intentions, a la "indexer", "link checker", whatever.

> Q.  I was thinking of writing a module that allowed something like
> 
> <Directory /foo>
> NoRobots on
> </Directory>
> 
> and for <Location> and .htacess.
> 
> The idea is that if a dir/URL is "protected" with this, then anything
> identifying itself as a robot will be denied access (403)

Too broad a category for me.

> -=-=-=
> 
> But perhaps it might be better to extend the authoriazation stuff to
> allow something like
> 
> <Limit GET>
> order allow,deny
> allow from all
> deny agent /robot
> </Limit>
> 
> which would also allow denial to individual user agents e.g.
> 
> deny agent Crawler/1.4 SiteTrasher/0.0001 Mozilla/42.42

Yes, that would be the right way to do it, except that I would
use quoted-strings (or just one agent per line), since the above
is ambiguous.

.....Roy

Mime
View raw message