httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dirk.vanGulik" <Dirk.vanGu...@jrc.it>
Subject Re: Suggestion to help robots and sites coexist a little better
Date Mon, 15 Jul 1996 08:30:19 GMT
> (sent to robots@webcrawler.com and the apache developers list)
> 
> Here's my suggestion:
> 
>  1) robots send a new HTTP header to say "I'm a robot", e.g.
>       Robot: Indexers R us
> 
>  2) servers are extended (e.g. an apache module) to look for this
>       header and based on local configuration (*) issues "403 Forbidden"
>       responses to robots that stray out of their allowed URL-space
> 
>  3) (*) on the site being visited, the server would read robots.txt and
>       perhaps other configuration files (such as .htaccess) to determine
>       which URLs/directories are off-limits.
> 
> Using this system, a robot that correctly identifies itself as such will
> not be able to accidentally stray into forbidden regions of the server
> (well, they won't have much luck if they do, and won't cause damage).
> 
> Adding an apache module to the distribution would make more web admins
> aware of robots.txt and the issues relating to it. Being the leader, Apache
> can implement this and the rest of the pack will follow.
> 
> rob

You might want to make this more attractive to robot developers by for
example adding a lines to the header like

	Index-Of-URLs: http:/asda/index.url.txt
	Index-Of-Metadata: http:/asada/index.metadata.txt
	
With various descriptions; such as url, line of keywords, line of description and blank
line. We found this quite usefull internally.

Dw.



Mime
View raw message