hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lukas Vlcek" <lukas.vl...@gmail.com>
Subject Re: Yahoo's production webmap is now on Hadoop
Date Tue, 19 Feb 2008 20:53:43 GMT
Impressive! Considering that Hadoop is open source software in early stage
of development written in Java could this be the *REAL* reason why Microsoft
want to buy Yahoo!? :-)

Lukas

On Feb 19, 2008 8:55 PM, Eric Zhang <ezhang@yahoo-inc.com> wrote:

> This is very impressive.  Congrats!.
>
> Which version of Hadoop is this running on and what's the input data size?
>
> Eric
>
> Owen O'Malley wrote:
> > The link inversion and ranking algorithms for Yahoo Search are now
> > being generated on Hadoop:
> >
> >
> http://developer.yahoo.com/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.html
> >
> >
> > Some Webmap size data:
> >
> >     * Number of links between pages in the index: roughly 1 trillion
> > links
> >     * Size of output: over 300 TB, compressed!
> >     * Number of cores used to run a single Map-Reduce job: over 10,000
> >     * Raw disk used in the production cluster: over 5 Petabytes
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message