hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter W." <pe...@marketingbrokers.com>
Subject Re: Yahoo's production webmap is now on Hadoop
Date Tue, 19 Feb 2008 22:25:38 GMT
Amazing milestone,

Looks like Y! had approximately 1B documents in the WebMap:

one trillion links=(10k million links/10 links per page)=1000 million  
pages=one billion.

If Google has 10B docs (indexed w/25 MR jobs) then Hadoop has  
acheived one-tenth of its scale?

Good stuff,

Peter W.




On Feb 19, 2008, at 9:58 AM, Owen O'Malley wrote:

> The link inversion and ranking algorithms for Yahoo Search are now  
> being generated on Hadoop:
>
> http://developer.yahoo.com/blogs/hadoop/2008/02/yahoo-worlds- 
> largest-production-hadoop.html
>
> Some Webmap size data:
>
>     * Number of links between pages in the index: roughly 1  
> trillion links
>     * Size of output: over 300 TB, compressed!
>     * Number of cores used to run a single Map-Reduce job: over 10,000
>     * Raw disk used in the production cluster: over 5 Petabytes
>


Mime
View raw message