tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mathias Walter" <mathias.wal...@gmx.net>
Subject RE: howto stop crawler and bots according to their user agent string
Date Tue, 15 Jul 2008 10:37:22 GMT
Hi,

> > I've put a robots.txt in webapps/ROOT, but this file is not
> > read again.
> 
> So, to check, the crawlers are not reading your robots.txt 
> and are crawling your site anyway?

I don't know it exactly. The problem is that the sites are linked from
anywhere. I'm not sure, if a crawler that follows the link
http://mydomain:port/servlet/page.jsp, looks for the robots.txt in the ROOT
webapp.

> 
> > I'd like to stop crawlers by their useragent string.
> 
> What do you mean by "stop"?  Do you want to return 404s or 
> similar when a request with a particular user agent string is 
> received?  If so, the obvious approach would be to write a 

I'd like the same behaviour like in Apache Webserver Deny/Allow rules. Is
there no common way to set this up with Tomcat?

--
Regards,
Mathias

> Filter that is placed in front of your webapp, or a Valve 
> that is placed in the request processing chain, that examines 
> the user agent string in the request and returns an 
> appropriate response if you don't like the agent.
> 
>                 - Peter
> 
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message