hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@cs.put.poznan.pl>
Subject Re: Gigablast.com search engine- 10BILLION PAGES!
Date Thu, 19 Jun 2008 13:26:03 GMT

> They claim to have less than 500 servers that contain 10billion pages

Such statements are not always supported by evidence. As a side-effect of 
another experiment, we compared document-count estimates from Google, Yahoo, 
Live and Gigablast -- they seem to reflect the actual index proportions between 
these search engines.

It's an internal tech report, so it may be rough around the edges, but even the 
illustrations should be pretty self-evident:


Here is a direct PDF link:



View raw message