httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark A. Craig" <mark.a.cr...@gmail.com>
Subject Re: [users@httpd] Blocking crawling of CGIs
Date Tue, 18 Sep 2007 18:42:41 GMT
There's no guarantee that crawlers will be polite and honor robots.txt 
directives; the search-engine ones probably do, but the spammers' ones 
definitely don't and in fact probably pay special attention to what's excluded. 
   (I have a honeypot entry in my robots.txt designed to catch and then block 
the malicious robots.)  OTOH, since the user-agent data is also only as reliable 
as the intent of whoever sets the crawler up, filtering based on that may not be 
much help either.  I seem to recall having read somewhere that it's possible to 
configure Apache to recognize "executables" independent of the OS and file 
extensions and associations?  If that's true, perhaps that might lead to some 
solution to your problem.

Mark

-------- Original Message  --------
Subject: [users@httpd] Blocking crawling of CGIs
From: Tony Rice (trice) <trice@cisco.com>
To: users@httpd.apache.org
Date: Tuesday, September 18, 2007 11:24:20 AM

> We've had some instances where crawlers have stumbled onto a cgi script
> which refers to itself and start pounding the server with requests to
> that cgi.
> 
> There are so many CGI scripts on this server that I don't want to
> maintain a huge robots.txt file.  Any suggestions on other techniques to
> keep crawlers away from cgi scripts?  Check the browser with
> BrowserMatch and then do something creative with "deny from env="?
> 


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message