tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From brycenesbitt <bry...@obviously.com>
Subject Re: Web spiders - disabling jsessionid
Date Sun, 03 Dec 2006 21:38:21 GMT


Rashmi Rubdi wrote:
> 
> So the solution for Bryce would be to leave the session on on each JSP
> page, and omit the cookies attribute of <Context which defaults it to
> true.  
> This should solve the problem of jsessionid for bots.
> From my observation search bots support cookies otherwise I would have the
> problem of jsessionid appended to URLs too.
> 

I'm just not getting it.

Can someone take a look at this site, and maybe give some insight?
http://www.citycarshare.org/howitworks.do

Or at 216.93.188.140 you can see a test intance which has the following
ROOT/META-INF/context.xml

   <?xml version='1.0' encoding='UTF-8'?>
   <Context path='/' cookies="false">
   </Context>

I can share with you lots of log lines showing the JSESSIONID, including
crawls by Google, Alexa and Exalead.  A quick scan of the Google index shows
cached pages with JSESSIONID.  2224 of the 9273 log lines from today have
JSESSIONID.  I have thousands on thousands of crawls of the same content, on
the same day, with different JSESSIONID's.

Here are some examples:
69.106.42.228 - - [01/Dec/2006:16:02:58 -0800] "GET
/images/events/CCS_5_Icon.png;jsessionid=6DC390F0ADC7569009CB60C98378919D
HTTP/1.1" 200 4960 "http://www.citycarshare.org/" "Mozilla/5.0 (Windows; U;
Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0"

193.47.80.51 - - [30/Nov/2006:04:37:56 -0800] "GET
/press.do;jsessionid=E49722F6235A31A3627A6C62753A7CDB HTTP/1.1" 200 22020
"-" "Exabot/3.0"
193.47.80.51 - - [30/Nov/2006:04:59:56 -0800] "GET
/press.do;jsessionid=1407BA083FB2123469A4E544C3F26DFC HTTP/1.1" 200 22020
"-" "Exabot/3.0"
193.47.80.51 - - [30/Nov/2006:06:16:01 -0800] "GET
/press.do;jsessionid=5FAFDDABFF42C82F3C766377F5AC9F44 HTTP/1.1" 200 22020
"-" "Exabot/3.0"
193.47.80.51 - - [30/Nov/2006:06:31:36 -0800] "GET
/press.do;jsessionid=EAFB1F3DB5B7D47DFF4212A66911754F HTTP/1.1" 200 22020
"-" "Exabot/3.0"
193.47.80.51 - - [30/Nov/2006:07:00:45 -0800] "GET
/press.do;jsessionid=ADF4E609E38901897648ABD6C7BF4E57 HTTP/1.1" 200 22020
"-" "Exabot/3.0"
193.47.80.51 - - [30/Nov/2006:07:20:54 -0800] "GET
/press.do;jsessionid=5049AD9757D8C7BAA599C2837EBFB3BE HTTP/1.1" 200 22020
"-" "Exabot/3.0"
193.47.80.51 - - [30/Nov/2006:07:37:42 -0800] "GET
/press.do;jsessionid=F53FC49BCAD98F4181F05DAC7D7A65C4 HTTP/1.1" 200 22020
"-" "Exabot/3.0"
193.47.80.51 - - [30/Nov/2006:07:49:13 -0800] "GET
/press.do;jsessionid=15FCF8DCE01CBD47DAB1A8D668EF9F38 HTTP/1.1" 200 22020
"-" "Exabot/3.0"
193.47.80.51 - - [30/Nov/2006:07:59:28 -0800] "GET
/press.do;jsessionid=E717438CB2746895BFF9C16DE6A72F28 HTTP/1.1" 200 22020
"-" "Exabot/3.0"

I am 1000% certain that not all bots browse with cookies, at least not all
the time.  How can I stop these bots from crawling me so often?  It is over
25% of my bandwidth just to the duplicate crawls, never mind the regular bot
traffic.
-- 
View this message in context: http://www.nabble.com/Web-spiders---disabling-jsessionid-tf2737558.html#a7667574
Sent from the Tomcat - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message