hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen <an.ronal...@gmail.com>
Subject sort and merge in map/reduce
Date Wed, 02 Nov 2011 03:07:11 GMT

I am curious about what is going on after the map puts key value pair
to the collector. I know there is something called spill and sort
merge happen. But I don't get a clear picture. My understanding is a
partitioner divides the key value pairs (map output) to several
"groups". Each "group" which will be sent to a particular reducer. For
each "group", the MapTask will sort the key value pair based on key
(why???) and materialized on local disk. I don't know where the merge
steps in and why we need merge.

On the reduce side, there is also a sort and merge step. Why is that necessary?

Thanks for helping me.


View raw message