tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Web spiders - disabling jsessionid
Date Fri, 01 Dec 2006 18:13:51 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mikolaj,

Back to the original question...

Mikolaj Rydzewski wrote:
> As you may know url rewriting feature is not a nice thing when spiders
> come to index your site -
> http://gabrito.com/post/javas-seo-blunder-jsessionid.

So, the problem is that your URLs contain ";jsessionid=...", right? When
does that become a problem?

That becomes a problem when google (or whomever) crawls your site on
different days and sees the same content with "different" URLs. Well, I
have a couple of thoughts about that.

1. A semi-colon is listed in the HTTP specification as being a valid
   delimiter, despite pretty much every major web server out there
   ignoring it and thinking that it's part of the path.
   This is partially the crawler's fault for not following the HTTP
   specification. The ";" character is not technically a valid URL
   character outside of it's role as a delimiter, just like "&" or "?".

2. If you strip-off the jsessionid argument for all of these URLs,
   you will end up with thousands of sessions being created for
   each URL requested by the google bot. Do you think that's a good
   idea?

3. If you don't want googlebot to get a session, why are you allocating
   one? If you need sessions to manage site navigation, then you
   cannot turn them off and have things work correctly... can you?

4. Consider instructing googlebot not to crawl certain portions of your
   site (those which require a session) by using a robots.txt file.

Just my .02.

- -chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFcHDf9CaO5/Lv0PARAtp5AKCgPVdAXu80zADXifTx6AJOYfpupACfXcPb
nEEqn4rpvDVatwSRf/XPScg=
=Y5KA
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message