hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@yahoo-inc.com>
Subject Re: slow shuffle
Date Sat, 06 Dec 2008 17:48:35 GMT



On 12/6/08 11:11 PM, "Songting Chen" <ken_cst1998@yahoo.com> wrote:

> That's cool.
> 
> Update on Issue 2:
> 
> I accidentally changed number of reducer to 1 (from 3). The problem is gone!
> That one reducer overlaps with Map well and copies 300 small map output pretty
> fast. 
> 
> So when there are large number of small Map outputs, I'd use only 1 reducer.
> (That doesn't really make sense - there may be some internal code issues).
> 
By the way, the approach of minimizing the number of reducers does make
sense.. If your map outputs are small, then you should consider minimizing
the number of reducers. You cut certain costs in the framework and thereby
improve the runtime of your job..

> Thanks,
> -Songting 
> 
> 
> --- On Sat, 12/6/08, Arun C Murthy <acm@yahoo-inc.com> wrote:
> 
>> From: Arun C Murthy <acm@yahoo-inc.com>
>> Subject: Re: slow shuffle
>> To: core-user@hadoop.apache.org
>> Date: Saturday, December 6, 2008, 1:20 AM
>> On Dec 5, 2008, at 2:43 PM, Songting Chen wrote:
>> 
>>> To summarize the slow shuffle issue:
>>> 
>>> 1. I think one problem is that the Reducer starts very
>>> late in the process, slowing the entire job
>> significantly.
>>> 
>>>   Is there a way to let reducer start earlier?
>>> 
>> 
>> http://issues.apache.org/jira/browse/HADOOP-3136 should
>> help you  
>> there, it's pretty close to getting in to 0.20.
>> 
>> Arun
>> 
>>> 2. Copying 300 files with 30K each took total 3 mins
>> (after all map  
>>> finished). This really puzzles me what's behind
>> the scene. (note
>>> that sorting takes < 1 sec)
>>> 
>>> Thanks,
>>> -Songting
>>> 
>>>> 
>>>> 
>>>> --- On Fri, 12/5/08, Songting Chen
>>>> <ken_cst1998@yahoo.com> wrote:
>>>> 
>>>>> From: Songting Chen
>> <ken_cst1998@yahoo.com>
>>>>> Subject: Re: slow shuffle
>>>>> To: core-user@hadoop.apache.org
>>>>> Date: Friday, December 5, 2008, 1:27 PM
>>>>> We have 4 testing data nodes with 3 reduce
>> tasks. The
>>>>> parallel.copies parameter has been increased
>> to 20,30,
>>>> even
>>>>> 50. But it doesn't really help...
>>>>> 
>>>>> 
>>>>> --- On Fri, 12/5/08, Aaron Kimball
>>>>> <aaron@cloudera.com> wrote:
>>>>> 
>>>>>> From: Aaron Kimball
>> <aaron@cloudera.com>
>>>>>> Subject: Re: slow shuffle
>>>>>> To: core-user@hadoop.apache.org
>>>>>> Date: Friday, December 5, 2008, 12:28 PM
>>>>>> How many reduce tasks do you have? Look
>> into
>>>>> increasing
>>>>>> mapred.reduce.parallel.copies from the
>> default of
>>>> 5 to
>>>>>> something more like
>>>>>> 20 or 30.
>>>>>> 
>>>>>> - Aaron
>>>>>> 
>>>>>> On Fri, Dec 5, 2008 at 10:00 PM, Songting
>> Chen
>>>>>> <ken_cst1998@yahoo.com>wrote:
>>>>>> 
>>>>>>> A little more information:
>>>>>>> 
>>>>>>> We optimized our Map process quite a
>> bit
>>>> that now
>>>>> the
>>>>>> Shuffle becomes the
>>>>>>> bottleneck.
>>>>>>> 
>>>>>>> 1. There are 300 Map jobs (128M size
>> block),
>>>> each
>>>>>> takes about 13 sec.
>>>>>>> 2. The Reducer starts running at a
>> very late
>>>>> stage
>>>>>> (80% maps are done)
>>>>>>> 3. copy 300 map outputs (shuffle)
>> takes as
>>>> long
>>>>> as the
>>>>>> entire map process,
>>>>>>> although each map output is just about
>>>> 50Kbytes
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --- On Fri, 12/5/08, Alex Loddengaard
>>>>>> <alex@cloudera.com> wrote:
>>>>>>> 
>>>>>>>> From: Alex Loddengaard
>>>>> <alex@cloudera.com>
>>>>>>>> Subject: Re: slow shuffle
>>>>>>>> To: core-user@hadoop.apache.org
>>>>>>>> Date: Friday, December 5, 2008,
>> 11:43
>>>> AM
>>>>>>>> These configuration options will
>> be
>>>> useful:
>>>>>>>> 
>>>>>>>> <property>
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> <name>mapred.job.shuffle.merge.percent</name>
>>>>>>>>> 
>> <value>0.66</value>
>>>>>>>>>  <description>The usage
>>>>> threshold at
>>>>>> which an
>>>>>>>> in-memory merge will be
>>>>>>>>>  initiated, expressed as a
>>>> percentage
>>>>> of
>>>>>> the total
>>>>>>>> memory allocated to
>>>>>>>>>  storing in-memory map
>> outputs,
>>>> as
>>>>> defined
>>>>>> by
>>>>>>>>> 
>>>>> mapred.job.shuffle.input.buffer.percent.
>>>>>>>>>  </description>
>>>>>>>>> </property>
>>>>>>>>> 
>>>>>>>>> <property>
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> <name>mapred.job.shuffle.input.buffer.percent</name>
>>>>>>>>> 
>> <value>0.70</value>
>>>>>>>>>  <description>The
>>>> percentage of
>>>>>> memory to be
>>>>>>>> allocated from the maximum
>>>>>>>>> heap
>>>>>>>>>  size to storing map outputs
>>>> during
>>>>> the
>>>>>> shuffle.
>>>>>>>>>  </description>
>>>>>>>>> </property>
>>>>>>>>> 
>>>>>>>>> <property>
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> <name>mapred.job.reduce.input.buffer.percent</name>
>>>>>>>>> 
>> <value>0.0</value>
>>>>>>>>>  <description>The
>>>> percentage of
>>>>>> memory-
>>>>>>>> relative to the maximum heap size-
>>>>>>>>> to
>>>>>>>>>  retain map outputs during the
>>>> reduce.
>>>>> When
>>>>>> the
>>>>>>>> shuffle is concluded, any
>>>>>>>>>  remaining map outputs in
>> memory
>>>> must
>>>>>> consume less
>>>>>>>> than this threshold
>>>>>>>>> before
>>>>>>>>>  the reduce can begin.
>>>>>>>>>  </description>
>>>>>>>>> </property>
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> How long did the shuffle take
>> relative
>>>> to
>>>>> the
>>>>>> rest of the
>>>>>>>> job?
>>>>>>>> 
>>>>>>>> Alex
>>>>>>>> 
>>>>>>>> On Fri, Dec 5, 2008 at 11:17 AM,
>>>> Songting
>>>>> Chen
>>>>>>>> 
>> <ken_cst1998@yahoo.com>wrote:
>>>>>>>> 
>>>>>>>>> We encountered a bottleneck
>> during
>>>> the
>>>>>> shuffle phase.
>>>>>>>> However, there is not
>>>>>>>>> much data to be shuffled
>> across
>>>> the
>>>>> network
>>>>>> at all -
>>>>>>>> total less than
>>>>>>>>> 10MBytes (the combiner
>> aggregated
>>>> most
>>>>> of
>>>>>> the data).
>>>>>>>>> 
>>>>>>>>> Are there any parameters or
>>>> anything we
>>>>> can
>>>>>> tune to
>>>>>>>> improve the shuffle
>>>>>>>>> performance?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> -Songting
>>>>>>>>> 
>>>>>>> 



Mime
View raw message