hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Goodhope <kengoodh...@gmail.com>
Subject Re: Why does the MR framework sorts the mapper output?
Date Tue, 27 Jul 2010 00:00:23 GMT
The combiner needs sorted input.

On Mon, Jul 26, 2010 at 1:46 PM, Alex Kozlov <alexvk@cloudera.com> wrote:

> Hi Ravi,
>
> Whether a sort is required is still a point of debate: the primary reason
> is to collect the entries with the same key, but one can implement MapReduce
> with hash deduping.  The performance advantages/disadvantages are still a
> subject of debate.
>
> If you don't need sorting, you can always implement map-side aggregation
> though and potentially set the # of reducers to 0.  There is no potential
> risk, but if you want to aggregate results across different mappers you'll
> get back to the original problem.
>
> Alex K
>
> On Mon, Jul 26, 2010 at 1:32 PM, Chinni, Ravi <rchinni@syncsort.com>wrote:
>
>>   I have an MR application that is running fine except for the
>> performance. Increasing the number of data nodes is not an option to me.
>>
>>
>>
>> Looking at the source code of MR framework, I noticed that the partitioned
>> output of each mapper is sorted (MapTask.java), and on the reduce side
>> partitions from various mappers are merged (ReduceTask.java) before running
>> the reduce step. Functionally, reducers in my application does not require
>> data to be in sorted order and getting rid of the sort and merge steps in
>> the framework will help my application.
>>
>>
>>
>> Does anyone know, why the sort and merge of intermediate data is being
>> done by the framework? Is there anything - MR functional concepts, framework
>> design etc. - that will need the sort and merge of intermediate data? I want
>> to give a shot in getting rid of the sort and merge steps in the framework
>> and want to know of any potential risks involved.
>>
>>
>>
>> Any input is appreciated.
>>
>>
>>
>> Thanks,
>>
>> Ravi
>>
>>
>>
>>
>> _____________________________________________________________________________
>>
>>  ATTENTION:
>>
>> The information contained in this message (including any files transmitted
>> with this message) may contain proprietary, trade secret or other
>> confidential and/or legally privileged information. Any pricing information
>> contained in this message or in any files transmitted with this message is
>> always confidential and cannot be shared with any third parties without
>> prior written approval from Syncsort. This message is intended to be read
>> only by the individual or entity to whom it is addressed or by their
>> designee. If the reader of this message is not the intended recipient, you
>> are on notice that any use, disclosure, copying or distribution of this
>> message, in any form, is strictly prohibited. If you have received this
>> message in error, please immediately notify the sender and/or Syncsort and
>> destroy all copies of this message in your possession, custody or control.
>>
>
>

Mime
View raw message