hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Barry, Sean F" <sean.f.ba...@intel.com>
Subject RE: Shuffle/sort
Date Wed, 06 Jun 2012 16:24:39 GMT
Thanks Harsh!
And is this the right source code for the shuffling that is done in the reduce task?



-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Tuesday, June 05, 2012 7:43 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle/sort

Hey Sean,

Check out http://www.slideshare.net/jhammerb/hadoop-map-reduce-arch-106883,
a slightly dated and MR1-oriented presentation from Owen O'Malley that goes a good level in-depth
to get an overview of how things work (including how reduces pull data).

After that, check out Chris Douglas'
that goes in-depth into the evolution of the implementations of that layer. This is pretty
much the state of 0.20/1.0 today too, and in 2.0 we have had Netty replacing Jetty among other
improvements but I haven't a public document link to share on this yet. Others may share the
changes docs on 2.0 if they have a link to one (or I'll respond back as soon as I have one).

I hope this helps!

On Wed, Jun 6, 2012 at 4:16 AM, Barry, Sean F <sean.f.barry@intel.com> wrote:
> "I was always wondering after mapping, how each reduce task get its 
> input. It is said in google's paper and hadoop's documentation that a 
> sort is done to aggregate the same key of the map output. But there is 
> no detailed explanation of how it is implemented and my intuition is 
> that perhaps a global hashing will work better than sorting. So I 
> really want to know the details and see whether my intuition is right. If I can find
out that in the source code, where should I start with?"
> I saw this question online and no one replied to it. does anyone know where I go to study
the source code for the shuffle and sort.
> -sean

Harsh J

View raw message