httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Hartill <r...@imdb.com>
Subject Suggestion to help robots and sites coexist a little better
Date Mon, 15 Jul 1996 01:13:03 GMT

(sent to robots@webcrawler.com and the apache developers list)

Here's my suggestion:

 1) robots send a new HTTP header to say "I'm a robot", e.g.
      Robot: Indexers R us

 2) servers are extended (e.g. an apache module) to look for this
      header and based on local configuration (*) issues "403 Forbidden"
      responses to robots that stray out of their allowed URL-space

 3) (*) on the site being visited, the server would read robots.txt and
      perhaps other configuration files (such as .htaccess) to determine
      which URLs/directories are off-limits.

Using this system, a robot that correctly identifies itself as such will
not be able to accidentally stray into forbidden regions of the server
(well, they won't have much luck if they do, and won't cause damage).

Adding an apache module to the distribution would make more web admins
aware of robots.txt and the issues relating to it. Being the leader, Apache
can implement this and the rest of the pack will follow.


Followups to robots@webcrawler.com  

rob
-- 
Rob Hartill (robh@imdb.com)
The Internet Movie Database (IMDb)  http://www.imdb.com/
           ...more movie info than you can poke a stick at.

Mime
View raw message