tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Adams" <chris.ad...@JETISRE.COM>
Subject RE: Web spiders - disabling jsessionid
Date Fri, 01 Dec 2006 15:39:24 GMT
Just thought I'd join this [probably not appropriate for this list :)]
conversation real quickly :)

Why do you assume that there would be a 1:1 or even 2:1 ratio between
the Googlebot and the "google-incognito" hits?

It would be very likely that Google, since it does play nice with agent
strings, etc., would only secretly hit your site very rarely. Because,
once it has determined the site's not doing SEO tricks, there's really
no need for it to check for SEO tricks as often as the Googlebot agent
would need to re-index the site.

Your empirical data really isn't useful, because it's based upon an
assumption that could very easily be false.  You can't use the lack of
Mozilla/5.0 hits to base any conclusions on whether Google uses a
"standard" 3rd-party agent or not... because, it's plausible (and
likely) that only an extremely small number (e.g. 1 or 2) of those hits
would come from this "google-incognito" agent.

- Chris


-----Original Message-----
From: Christopher Schultz [mailto:chris@christopherschultz.net] 
Sent: Friday, December 01, 2006 3:13 PM
To: Tomcat Users List
Subject: Re: Web spiders - disabling jsessionid

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Leon,

Leon Rosenberg wrote:
> you believe everything you've been told ?:-)

Well, I've been told by you, and I don't believe you. ;)

> google has 3 (at least 3 known) user agents : google, mozzila with
> google-bot in the agent string (the one you sent) and another one,
> which is just Mozilla/5.0.
> 
> google uses this 3rd agent to check your site from another ip adress,
> whether you do some ugly seo stuff, like cloacking etc.
> 
> If it detects that you deliver different content to his
> mozilla-disguised bot, your chances to be thrown out of the index are
> pretty high.

This sounds pretty plausible. Unfortunately, my empirical data suggests
otherwise. Allow me to post a portion of webalizers "top user agents"
list for a small site I maintain for the month of November:

# 	Hits 	User Agent
1 	26529 	48.64% 	Googlebot/2.1
2 	12077 	22.14% 	MSIE 6.0
3 	5285 	9.69% 	Yahoo! Slurp
4 	3353 	6.15% 	Mozilla/5.0

There are 11 more user agents which are all pretty much irrelevant. As
you can see, "googlebot" appears with a plurality of the hits (yeah,
it's not a really popular site). That's a /lot/ of hits compared to the
others. In fact, if you agree that "MSIE 6.0" is not google-in-disguise,
then it is not possible for the remaining user agent stats to sum to a
value even close to what googlebot says.

Webalizer can be configured to "collapse" different user agent strings
into one single user agent (say, anything containing MSIE into a single
MSIE in order to get an aggregate MSIE usage number). No such
aggregation is being used, here, so what you see is what you get.

If your assertion was correct, I would have expected to see a large
number (perhaps 1/3 or 1/2 of the googlebot hits) to come from "Mozilla"
as google-incognito, but that's not the case.

Can you give a reference to where you discovered this "fact"?

- -chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFcEZ/9CaO5/Lv0PARAuvqAJ41d+SbmskQIDH1xW5obI2f2xQWTwCfavcf
ed8ZaktgYzFpjfk2lli4vns=
=HZmV
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org




---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message