Return-Path: Delivered-To: apmail-tomcat-users-archive@www.apache.org Received: (qmail 78265 invoked from network); 9 Feb 2010 16:08:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Feb 2010 16:08:10 -0000 Received: (qmail 49305 invoked by uid 500); 9 Feb 2010 16:08:06 -0000 Delivered-To: apmail-tomcat-users-archive@tomcat.apache.org Received: (qmail 49239 invoked by uid 500); 9 Feb 2010 16:08:06 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 49228 invoked by uid 99); 9 Feb 2010 16:08:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Feb 2010 16:08:06 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of pid@pidster.com does not designate 209.85.212.45 as permitted sender) Received: from [209.85.212.45] (HELO mail-vw0-f45.google.com) (209.85.212.45) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Feb 2010 16:07:57 +0000 Received: by vws7 with SMTP id 7so1083189vws.18 for ; Tue, 09 Feb 2010 08:07:35 -0800 (PST) Received: by 10.220.72.10 with SMTP id k10mr686445vcj.202.1265731655226; Tue, 09 Feb 2010 08:07:35 -0800 (PST) Received: from phoenix.config (94-193-98-41.zone7.bethere.co.uk [94.193.98.41]) by mx.google.com with ESMTPS id 42sm1943800vws.8.2010.02.09.08.07.31 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 09 Feb 2010 08:07:33 -0800 (PST) Message-ID: <4B718840.4000406@pidster.com> Date: Tue, 09 Feb 2010 16:07:28 +0000 From: Pid Reply-To: pid@pidster.com Organization: Pidster Inc User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.7) Gecko/20100111 Thunderbird/3.0.1 MIME-Version: 1.0 To: users@tomcat.apache.org Subject: Re: JSESSIONID and impact on google References: <1265725862.2842.141.camel@mosu.cotroceni.esolutions.ro> <4B71834F.9020400@christopherschultz.net> In-Reply-To: <4B71834F.9020400@christopherschultz.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 09/02/2010 15:46, Christopher Schultz wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Marian, > > On 2/9/2010 9:31 AM, Marian Simpetru wrote: >> Google act as a non cookie browser and hence he is served with non >> unique URLs (because of session ID is appended to URL). > > I heard at one point that Google's crawler *did* support cookies. I > never verified that, but it sounds like they currently do not support them. > >> Question is: Is there a way to configure tomcat to only use cookies (not >> append jsessionid to URL for cookie0less browsers). > > It's not a Tomcat configuration, but you can always write a filter like > this: > > public class NoURLRewriteFilter > implements Filter > { > public void doFilter(...) { > chain.doFilter(request, new HttpServletResponseWrapper(response) { > public String encodeURL(String url) { return url }; > public String encodeUrl(String url) { return url }; > public String encodeRedirectURL(String url) { return url }; > public String encodeRedirectUrl(String url) { return url }; > }); > } > } > > Now, this will likely cause an explosion in the number of sessions > generated by Google's crawler. You might want to couple this with a > separate filter (or just create a GoogleCrawlerFilter that does all > this) that identifies Google's (and others) user agent and intercepts > calls to getSession() and either refuses to create a session (probably > not a good idea) or returns a fake session that gets discarded after > every request. Another option would be to set the session timeout to > something like 10 seconds so the session dies relatively quickly instead > of sticking around for a long time, wasting memory. > >> Maybe a better idea would be that someone from Apache Tomcat should push >> to google with some standards tomcat implement in this respect so that >> google change the algorithm and not punish with low ranking websites >> powered by tomcat. > > This is not a "Tomcat problem": it's a problem with any site that > requires sessions to maintain state on the server. > > I agree with Chuck: fix your webapp to tolerate Google's crawler, or > suffer the consequences. > > Something else you can do is use a robots.txt file to prevent the > crawler from hitting certain URLs. That might help. I'm not doing anything special, I don't think. Google bots hit our site, the session count goes up a bit. Google does not include jsessionid in the URLs it indexes. It may be that the site has been around for long enough that the Google algorithms know that we have a session id should be removed from a URL. It would be surprising to me if Google (et al) was not trying to remove PHPSESSIONID and JSESSIONID data from URLs. p > - -chris > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAktxg08ACgkQ9CaO5/Lv0PBxDACgweTaZAglz476s7TvYo63//2a > IgcAoIp0u2ZxOes8fFPuUAoP2FrHk/VN > =FjsP > -----END PGP SIGNATURE----- > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org > For additional commands, e-mail: users-help@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org