httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Hartill <>
Subject Re: robot denial
Date Thu, 18 Jul 1996 20:57:03 GMT
> > deny agent Crawler/1.4 SiteTrasher/0.0001 Mozilla/42.42
> > 
> > -=-=-=-=
> > 
> > What do people think is the best route? I like the latter. Is it posisble
> > with the API to write "deny agent" as a module? or is it a patch job?
> What return value do you want to give to bad robots?  The easy way
> is to make the module detect a baddie and then return SERVER_ERROR
> to crash the request so letting the core take care of the fallout.
> This could upset sites with funky error handler cgi however so
> another option would to be to send out some faked up page with a
> more clueful HTTP value than plain old 500

The idea is to be friendly (for once :-) and guide rebots which have
the decency to identify themselves away from toys they're not meant
to play with. A 403 will suffice as a default.

When a robot comes along without identifying iteself, it is of course
fair game.

The idea can be adapted later so that on startup, the server can read
robots.txt and protect itself from accidents.

> Perhaps you could make it more configurable with a directive to
> allow Joe Admin to define his own redirect message...
> 	RobotRedirect /Errors/bad_robot.html

They're not going to read it :-)

View raw message