hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bai Shen <baishen.li...@gmail.com>
Subject Hadoop map reduce merge algorithm
Date Thu, 12 Jan 2012 16:27:20 GMT
Can someone explain how the map reduce merge is done?  As far as I can
tell, it appears to pull all of the spill files into one giant file to send
to the reducer.  Is this correct?  Even if you set smaller spill files and
a lower sort factor, the eventual merge is still the same.  It just takes
more passes to get there.


View raw message