tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <>
Subject Re: Web spiders - disabling jsessionid
Date Fri, 01 Dec 2006 18:13:51 GMT
Hash: SHA1


Back to the original question...

Mikolaj Rydzewski wrote:
> As you may know url rewriting feature is not a nice thing when spiders
> come to index your site -

So, the problem is that your URLs contain ";jsessionid=...", right? When
does that become a problem?

That becomes a problem when google (or whomever) crawls your site on
different days and sees the same content with "different" URLs. Well, I
have a couple of thoughts about that.

1. A semi-colon is listed in the HTTP specification as being a valid
   delimiter, despite pretty much every major web server out there
   ignoring it and thinking that it's part of the path.
   This is partially the crawler's fault for not following the HTTP
   specification. The ";" character is not technically a valid URL
   character outside of it's role as a delimiter, just like "&" or "?".

2. If you strip-off the jsessionid argument for all of these URLs,
   you will end up with thousands of sessions being created for
   each URL requested by the google bot. Do you think that's a good

3. If you don't want googlebot to get a session, why are you allocating
   one? If you need sessions to manage site navigation, then you
   cannot turn them off and have things work correctly... can you?

4. Consider instructing googlebot not to crawl certain portions of your
   site (those which require a session) by using a robots.txt file.

Just my .02.

- -chris

Version: GnuPG v1.4.5 (MingW32)
Comment: Using GnuPG with Mozilla -


To start a new topic, e-mail:
To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message