hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ch huang <justlo...@gmail.com>
Subject Re: issue about Shuffled Maps in MR job summary
Date Thu, 12 Dec 2013 01:03:29 GMT
hi,
    suppose i have 5-worknode cluster,each worknode can allocate 40G mem
,and i do not care map task,be cause the map task in my job finished within
half a minuter,as my observe the real slow task is reduce, i allocate 12G
to each reduce task,so each worknode can support 3 reduce parallel,and the
whole cluster can support 15 reducer,and i run the job with all 15 reducer,
and i do not know if i increase reducer number from 15 to 30 ,each reduce
allocate 6G MEM,that will speed the job or not ,the job run on my product
env, it run nearly 1 week,it still not finished

On Wed, Dec 11, 2013 at 9:50 PM, java8964 <java8964@hotmail.com> wrote:

>  The whole job complete time depends on a lot of factors. Are you sure
> the reducers part is the bottleneck?
>
> Also, it also depends on how many Reducer input groups it has in your MR
> job. If you only have 20 reducer groups, even you jump your reducer count
> to 40, then the epoch of reducers part won't have too much change, as the
> additional 20 reducer task won't get data to process.
>
> If you have a lot of reducer input groups, and your cluster does have
> capacity at this time, and your also have a lot idle reducer slot, then
> increase your reducer count should decrease your whole job complete time.
>
> Make sense?
>
> Yong
>
>  ------------------------------
> Date: Wed, 11 Dec 2013 14:20:24 +0800
> Subject: Re: issue about Shuffled Maps in MR job summary
> From: justlooks@gmail.com
> To: user@hadoop.apache.org
>
>
> i read the doc, and find if i have 8 reducer ,a map task will output 8
> partition ,each partition will be send to a different reducer, so if i
> increase reduce number ,the partition number increase ,but the volume on
> network traffic is same,why sometime ,increase reducer number will not
> decrease job complete time ?
>
> On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B <vinayakumar.b@huawei.com>wrote:
>
>  It looks simple, J
>
>
>
> Shuffled Maps= Number of Map Tasks * Number of Reducers
>
>
>
> Thanks and Regards,
>
> Vinayakumar B
>
>
>
> *From:* ch huang [mailto:justlooks@gmail.com]
> *Sent:* 11 December 2013 10:56
> *To:* user@hadoop.apache.org
> *Subject:* issue about Shuffled Maps in MR job summary
>
>
>
> hi,maillist:
>
>            i run terasort with 16 reducers and 8 reducers,when i double
> reducer number, the Shuffled maps is also double ,my question is the job
> only run 20 map tasks (total input file is 10,and each file is 100M,my
> block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers
> run and 320 maps in 16 reducers run?how to caculate the shuffle maps number?
>
>
>
> 16 reducer summary output:
>
>
>
>
>
>  Shuffled Maps =320
>
>
>
> 8 reducer summary output:
>
>
>
> Shuffled Maps =160
>
>
>

Mime
View raw message