hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Yahoo's production webmap is now on Hadoop
Date Tue, 19 Feb 2008 21:21:31 GMT
Owen O'Malley wrote:
> The link inversion and ranking algorithms for Yahoo Search are now being 
> generated on Hadoop:
> 
> http://developer.yahoo.com/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.html

> 
> 
> Some Webmap size data:
> 
>     * Number of links between pages in the index: roughly 1 trillion links
>     * Size of output: over 300 TB, compressed!
>     * Number of cores used to run a single Map-Reduce job: over 10,000
>     * Raw disk used in the production cluster: over 5 Petabytes
> 
> 

Truly impressive. IMHO this is something the project should boast about, 
i.e. include this data point in the scalability / performance section.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message