hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ren Zuocheng <awake...@gmail.com>
Subject How does Hadoop do sort(shuffle) after map exactly?
Date Tue, 23 Nov 2010 03:29:59 GMT
I'm new to Hadoop and I want to know its implementation better. I was always wondering after
mapping, how each reduce task get its input. It is said in google's paper and hadoop's documentation
that a sort is done to aggregate the same key of the map output. But there is no detailed
explanation of how it is implemented and my intuition is that perhaps a global hashing will
work better than sorting. So I really want to know the details and see whether my intuition
is right. If I can find out that in the source code, where should I start with?

Sent from my iPhone
View raw message