hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Luo <lgpub...@yahoo.com.cn>
Subject sort at reduce side
Date Wed, 03 Feb 2010 05:10:04 GMT
Hi all,
I want to know some more details about the sorting at the reduce side. 

The intermediate result generated at the map side is stored as map file which actually consists
of two sub-files, namely index file and data file. The index file stores the keys and it could
point to corresponding record stored in the data file.  What I think is that when intermediate
result (even only part of it for each mapper) is shuffled to reducer, it is still kept in
map file. If so, in order to efficiently sort the data, reducer actually only read the index
part of each spill (which is a map file) and sort the keys, instead of reading whole records
from disk and sort them. 

Does reducer actually do as what I expect?



View raw message