tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pid <...@pidster.com>
Subject Re: JSESSIONID and impact on google
Date Tue, 09 Feb 2010 16:07:28 GMT
On 09/02/2010 15:46, Christopher Schultz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Marian,
>
> On 2/9/2010 9:31 AM, Marian Simpetru wrote:
>> Google act as a non cookie browser and hence he is served with non
>> unique URLs (because of session ID is appended to URL).
>
> I heard at one point that Google's crawler *did* support cookies. I
> never verified that, but it sounds like they currently do not support them.
>
>> Question is: Is there a way to configure tomcat to only use cookies (not
>> append jsessionid to URL for cookie0less browsers).
>
> It's not a Tomcat configuration, but you can always write a filter like
> this:
>
> public class NoURLRewriteFilter
>     implements Filter
> {
>    public void doFilter(...) {
>      chain.doFilter(request, new HttpServletResponseWrapper(response) {
>        public String encodeURL(String url) { return url };
>        public String encodeUrl(String url) { return url };
>        public String encodeRedirectURL(String url) { return url };
>        public String encodeRedirectUrl(String url) { return url };
>      });
>    }
> }
>
> Now, this will likely cause an explosion in the number of sessions
> generated by Google's crawler. You might want to couple this with a
> separate filter (or just create a GoogleCrawlerFilter that does all
> this) that identifies Google's (and others) user agent and intercepts
> calls to getSession() and either refuses to create a session (probably
> not a good idea) or returns a fake session that gets discarded after
> every request. Another option would be to set the session timeout to
> something like 10 seconds so the session dies relatively quickly instead
> of sticking around for a long time, wasting memory.
>
>> Maybe a better idea would be that someone from Apache Tomcat should push
>> to google with some standards tomcat implement in this respect so that
>> google change the algorithm and not punish with low ranking websites
>> powered by tomcat.
>
> This is not a "Tomcat problem": it's a problem with any site that
> requires sessions to maintain state on the server.
>
> I agree with Chuck: fix your webapp to tolerate Google's crawler, or
> suffer the consequences.
>
> Something else you can do is use a robots.txt file to prevent the
> crawler from hitting certain URLs. That might help.

I'm not doing anything special, I don't think.
Google bots hit our site, the session count goes up a bit.
Google does not include jsessionid in the URLs it indexes.

It may be that the site has been around for long enough that the Google 
algorithms know that we have a session id should be removed from a URL.

It would be surprising to me if Google (et al) was not trying to remove 
PHPSESSIONID and JSESSIONID data from URLs.


p


> - -chris
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAktxg08ACgkQ9CaO5/Lv0PBxDACgweTaZAglz476s7TvYo63//2a
> IgcAoIp0u2ZxOes8fFPuUAoP2FrHk/VN
> =FjsP
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message