tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pid <>
Subject Re: JSESSIONID and impact on google
Date Tue, 09 Feb 2010 16:07:28 GMT
On 09/02/2010 15:46, Christopher Schultz wrote:
> Hash: SHA1
> Marian,
> On 2/9/2010 9:31 AM, Marian Simpetru wrote:
>> Google act as a non cookie browser and hence he is served with non
>> unique URLs (because of session ID is appended to URL).
> I heard at one point that Google's crawler *did* support cookies. I
> never verified that, but it sounds like they currently do not support them.
>> Question is: Is there a way to configure tomcat to only use cookies (not
>> append jsessionid to URL for cookie0less browsers).
> It's not a Tomcat configuration, but you can always write a filter like
> this:
> public class NoURLRewriteFilter
>     implements Filter
> {
>    public void doFilter(...) {
>      chain.doFilter(request, new HttpServletResponseWrapper(response) {
>        public String encodeURL(String url) { return url };
>        public String encodeUrl(String url) { return url };
>        public String encodeRedirectURL(String url) { return url };
>        public String encodeRedirectUrl(String url) { return url };
>      });
>    }
> }
> Now, this will likely cause an explosion in the number of sessions
> generated by Google's crawler. You might want to couple this with a
> separate filter (or just create a GoogleCrawlerFilter that does all
> this) that identifies Google's (and others) user agent and intercepts
> calls to getSession() and either refuses to create a session (probably
> not a good idea) or returns a fake session that gets discarded after
> every request. Another option would be to set the session timeout to
> something like 10 seconds so the session dies relatively quickly instead
> of sticking around for a long time, wasting memory.
>> Maybe a better idea would be that someone from Apache Tomcat should push
>> to google with some standards tomcat implement in this respect so that
>> google change the algorithm and not punish with low ranking websites
>> powered by tomcat.
> This is not a "Tomcat problem": it's a problem with any site that
> requires sessions to maintain state on the server.
> I agree with Chuck: fix your webapp to tolerate Google's crawler, or
> suffer the consequences.
> Something else you can do is use a robots.txt file to prevent the
> crawler from hitting certain URLs. That might help.

I'm not doing anything special, I don't think.
Google bots hit our site, the session count goes up a bit.
Google does not include jsessionid in the URLs it indexes.

It may be that the site has been around for long enough that the Google 
algorithms know that we have a session id should be removed from a URL.

It would be surprising to me if Google (et al) was not trying to remove 


> - -chris
> Version: GnuPG v1.4.10 (MingW32)
> Comment: Using GnuPG with Mozilla -
> iEYEARECAAYFAktxg08ACgkQ9CaO5/Lv0PBxDACgweTaZAglz476s7TvYo63//2a
> IgcAoIp0u2ZxOes8fFPuUAoP2FrHk/VN
> =FjsP
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message