hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: Cluster config: Mapper:Reducer Task Capapcity
Date Mon, 30 Sep 2013 19:52:23 GMT
Hi Himanshu,

Changing the ratio is definitely a reasonable thing to do.  The capacities
come from the mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
 You can tweak these on your nodes to get your desired ratio.

-Sandy


On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <himanshuvj@gmail.com>wrote:

> Hi,
>
> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map
> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a
> ratio of 2.7. We have a lot of variety of jobs running and we want to
> increase the throughput.
>
> My manual observation was that we hit the Mapper capacity and hence many
> jobs have to wait even though lot of room left in Reduce capacity. I mined
> the jobtracker logs for the jobs that completed and saw that on a hourly
> basis as well as daily basis the mapper:reducer ratio was 4-5.
>
> To increase the throughput I was thinking that I experiment changing the
> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
> ~4.
>
> Does this sound like a correct approach ? Is this something that I can
> control or it's determined automatically by Hadoop ?
>
> Have any of you done this kind of exercise ? If yes can you please direct
> how to go about changing this ratio. I am not finding much literature on
> it.
>
> Note: Mapper and ReducerTask Capacity is the max total no. of
> mappers/reducers you can run on the cluster at any point.
>
> Regards,
> -Himanshu Vijay
>

Mime
View raw message