hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Wintle <tim.win...@teamrubber.com>
Subject Re: Yahoo's production webmap is now on Hadoop
Date Wed, 20 Feb 2008 10:09:26 GMT
How do you handle running multiple jobs? Whenever I run multiple jobs
they run sequentially (if they are the same priority)

Tim

On Tue, 2008-02-19 at 09:58 -0800, Owen O'Malley wrote:
> The link inversion and ranking algorithms for Yahoo Search are now  
> being generated on Hadoop:
> 
> http://developer.yahoo.com/blogs/hadoop/2008/02/yahoo-worlds-largest- 
> production-hadoop.html
> 
> Some Webmap size data:
> 
>      * Number of links between pages in the index: roughly 1 trillion  
> links
>      * Size of output: over 300 TB, compressed!
>      * Number of cores used to run a single Map-Reduce job: over 10,000
>      * Raw disk used in the production cluster: over 5 Petabytes
> 


Mime
View raw message