hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Buyung Bahari <buyung.bah...@detik.com>
Subject Re: Calculating the slot
Date Thu, 20 May 2010 02:41:02 GMT
Ferdinand Neman wrote:
> On Thu, May 20, 2010 at 2:35 AM, Allen Wittenauer
> <awittenauer@linkedin.com> wrote:
>> On May 18, 2010, at 11:06 PM, Ferdinand Neman wrote:
>>> I dont understand the "product of the number of nodes in cluster"
>>> part. Can some one help me on this ?
>> Aa simple google search (define:product) would have let you to this definition:
>> a quantity obtained by multiplication; "the product of 2 and 3 is 6"
> Pardon my english, Im Indonesian,
> However, i've read in Map/Reduce tutorial ;
> "How Many Reduces?
> The right number of reduces seems to be 0.95 or 1.75 multiplied by
> (<no. of nodes> * mapred.tasktracker.reduce.tasks.maximum).
> With 0.95 all of the reduces can launch immediately and start
> transfering map outputs as the maps finish. With 1.75 the faster nodes
> will finish their first round of reduces and launch a second wave of
> reduces doing a much better job of load balancing."
> I have small cluster, 4 task tracker,
> mapred.tasktracker.reduce.tasks.maximum is 7. So maximum total reducer
> slot would be 28. I try to set number of reducer to 20.
> Before setting the number of reducer (where default reducer is 1) a
> job that i run can finish in 40 minutes. Now with 20 reducer it take
> longer to 44 minutes. Is this normal ?
According to my experience, more reducer cause more split. And because 
of that 1 reducer have less data and another reducer have more data 
variant. So the reducer with less data should wait the reducer with more 
data. The effect of that, the process takes longer.

So in my configuration, i set reducer number per server same with cpu 
core. If you have 8 core, you can set Map to 4 and reducer to 4. Because 
in reality, the map and reduce can run parallel.

Sory, i'm indonesian to :). So my english not good enough to explain.

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message