Looks like either or both of us are misunderstanding,
Edward Capriolo wrote:
> Depends on the sorting algorithm, right? (thinking back to CS classes)
> BIG O on bubble sort for already sorted data is great, big0 on merge
> sort for already sorted is close to worse case scenario ? (yes/no)
>
It would be faster if the merge sort algorithm knows that each chunk of
input is already sorted, only linear merging is needed.
Is it the case that in the current Sort, no assumption about the
ordering of the input records is assumed? If so, why sort the output of
Map at the mapper side? (except for creating input for the combiner.)
> On Fri, Jan 8, 2010 at 2:23 PM, Le Zhao <lezhao@cs.cmu.edu> wrote:
>> Thanks Yongqiang. That answered my question.
>>
>> Interesting. I didn't know that the mapper output is sorted. Is it the
>> case that each map task's output is sorted? or that there can be multiple
>> pieces if the map task has too much output?
Let me rephrase the question above: I know that it will be sorted during
reduce, but will it be sorted immediately after map before shuffle?
If it is, then to what extent will the map output be sorted by the
mapper? Will each map output several lists of internally sorted
records? or just one consistently sorted list of records?
Thanks,
Le
