hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Le Zhao <lez...@cs.cmu.edu>
Subject Re: Will already sorted Mapper output improve speed of Sort in reducer?
Date Fri, 08 Jan 2010 19:23:05 GMT
Thanks Yongqiang.  That answered my question.

Interesting.  I didn't know that the mapper output is sorted.  Is it the 
case that each map task's output is sorted? or that there can be 
multiple pieces if the map task has too much output?


Yongqiang He wrote:
> The mapper output is sorted using the quick-sort at the mapper side
> (actually the sort algorithm can be pluggable). The reducer only needs to
> use a merge sort in order to reduce number of files.
> Right now Hadoop always run a sorter at the mapper side to sort map output.
> One interesting point is to see how much time can be saved if the mapper's
> input is sorted and output is also sorted naturally (this is true in most
> situations if the operation at mapper is only sel, fil, or projection). In
> this case, the mapper side sorting procedure is actually unneeded.
> Thanks
> yongqiang
> On 1/7/10 7:21 PM, "Le Zhao" <lezhao@cs.cmu.edu> wrote:
>> Hi,
>> Does anybody know whether sorted Mapper output will decrease the Sort in
>> the reduce phase?
>> I'm teaching a class, and am curious to know how much of a difference
>> will sorted vs. unsorted mapper output be.  If the merge sort is
>> implemented to deal with already sorted input, then I guess it will be
>> fast.  Am I right?
>> Thanks,
>> Le

View raw message