hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Le Zhao <lez...@cs.cmu.edu>
Subject Re: Will already sorted Mapper output improve speed of Sort in reducer?
Date Tue, 12 Jan 2010 19:04:22 GMT
Looks like either or both of us are misunderstanding,

Edward Capriolo wrote:
> Depends on the sorting algorithm, right? (thinking back to CS classes)
> BIG O on bubble sort for already sorted data is great, big-0 on merge
> sort for already sorted is close to worse case scenario ? (yes/no)

It would be faster if the merge sort algorithm knows that each chunk of 
input is already sorted, only linear merging is needed.

Is it the case that in the current Sort, no assumption about the 
ordering of the input records is assumed?  If so, why sort the output of 
Map at the mapper side? (except for creating input for the combiner.)

> On Fri, Jan 8, 2010 at 2:23 PM, Le Zhao <lezhao@cs.cmu.edu> wrote:
>> Thanks Yongqiang.  That answered my question.
>> Interesting.  I didn't know that the mapper output is sorted.  Is it the
>> case that each map task's output is sorted? or that there can be multiple
>> pieces if the map task has too much output?

Let me rephrase the question above: I know that it will be sorted during 
reduce, but will it be sorted immediately after map before shuffle?

If it is, then to what extent will the map output be sorted by the 
mapper?  Will each map output several lists of internally sorted 
records? or just one consistently sorted list of records?


View raw message