hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanshu Vijay <himansh...@gmail.com>
Subject Re: Cluster config: Mapper:Reducer Task Capapcity
Date Tue, 01 Oct 2013 07:06:44 GMT
What is the down side of increasing both
and mapred.tasktracker.reduce.tasks.maximum to same value ?

I read on this link<http://developer.yahoo.com/hadoop/tutorial/module7.html>that:
 mapred.tasktracker.map.tasks.maximum 1/2 * (cores/node) to 2 *
of map tasks to deploy on each machine.
mapred.tasktracker.reduce.tasks.maximum1/2 * (cores/node) to 2 *
(cores/node) Number of reduce tasks to deploy on each machine.
Each node has 8 cores. So according to above guidance I should both the
configs from 4 to 16. The ratio of mapper to reducer doesn't really matter
as far as these two properties are concerned.

On Mon, Sep 30, 2013 at 12:52 PM, Sandy Ryza <sandy.ryza@cloudera.com>wrote:

> Hi Himanshu,
> Changing the ratio is definitely a reasonable thing to do.  The capacities
> come from the mapred.tasktracker.map.tasks.maximum
> and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
>  You can tweak these on your nodes to get your desired ratio.
> -Sandy
> On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <himanshuvj@gmail.com>wrote:
>> Hi,
>> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map
>> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a
>> ratio of 2.7. We have a lot of variety of jobs running and we want to
>> increase the throughput.
>> My manual observation was that we hit the Mapper capacity and hence many
>> jobs have to wait even though lot of room left in Reduce capacity. I mined
>> the jobtracker logs for the jobs that completed and saw that on a hourly
>> basis as well as daily basis the mapper:reducer ratio was 4-5.
>> To increase the throughput I was thinking that I experiment changing the
>> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
>> ~4.
>> Does this sound like a correct approach ? Is this something that I can
>> control or it's determined automatically by Hadoop ?
>> Have any of you done this kind of exercise ? If yes can you please direct
>> how to go about changing this ratio. I am not finding much literature on
>> it.
>> Note: Mapper and ReducerTask Capacity is the max total no. of
>> mappers/reducers you can run on the cluster at any point.
>> Regards,
>> -Himanshu Vijay

-Himanshu Vijay

View raw message