hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: How does Hadoop do sort(shuffle) after map exactly?
Date Tue, 23 Nov 2010 06:27:55 GMT

On Tue, Nov 23, 2010 at 8:59 AM, Ren Zuocheng <awakenrz@gmail.com> wrote:
> Hi,
> I'm new to Hadoop and I want to know its implementation better. I was always wondering
after mapping, how each reduce task get its input. It is said in google's paper and hadoop's
documentation that a sort is done to aggregate the same key of the map output. But there is
no detailed explanation of how it is implemented and my intuition is that perhaps a global
hashing will work better than sorting. So I really want to know the details and see whether
my intuition is right. If I can find out that in the source code, where should I start with?

Read the MapTask and ReduceTask classes, and you'll find how the
Shuffle is implemented within.

The whole process is like this if am right:
MapTask                                                         || ReduceTask
Map Emit -> Partition -> (Combine) Sort -> Spill || Shuffle (Combine)
-> Merge -> Reduce

Harsh J

View raw message