Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of sandy.ryza@cloudera.com
 designates 209.85.220.52 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAKG3H40M0v3WnPVz7yHCYHzytoyVC_bSqUzCBjcpGT36GX40PA@mail.gmail.com>
References: 
 <CAKG3H40M0v3WnPVz7yHCYHzytoyVC_bSqUzCBjcpGT36GX40PA@mail.gmail.com>
Date: Mon, 30 Sep 2013 12:52:23 -0700
Message-ID: 
 <CACBYxK+tD2chEH83pwzUT07dt5DVKXxN8dD-2N14cwwsQiMMAg@mail.gmail.com>
Subject: Re: Cluster config: Mapper:Reducer Task Capapcity
From: Sandy Ryza <sandy.ryza@cloudera.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=047d7b6d8818c4731004e79f2c78

--047d7b6d8818c4731004e79f2c78
Content-Type: text/plain; charset=ISO-8859-1

Hi Himanshu,

Changing the ratio is definitely a reasonable thing to do.  The capacities
come from the mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
 You can tweak these on your nodes to get your desired ratio.

-Sandy


On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <himanshuvj@gmail.com>wrote:

> Hi,
>
> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map
> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a
> ratio of 2.7. We have a lot of variety of jobs running and we want to
> increase the throughput.
>
> My manual observation was that we hit the Mapper capacity and hence many
> jobs have to wait even though lot of room left in Reduce capacity. I mined
> the jobtracker logs for the jobs that completed and saw that on a hourly
> basis as well as daily basis the mapper:reducer ratio was 4-5.
>
> To increase the throughput I was thinking that I experiment changing the
> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
> ~4.
>
> Does this sound like a correct approach ? Is this something that I can
> control or it's determined automatically by Hadoop ?
>
> Have any of you done this kind of exercise ? If yes can you please direct
> how to go about changing this ratio. I am not finding much literature on
> it.
>
> Note: Mapper and ReducerTask Capacity is the max total no. of
> mappers/reducers you can run on the cluster at any point.
>
> Regards,
> -Himanshu Vijay
>

--047d7b6d8818c4731004e79f2c78
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Himanshu,<div><br></div><div style>Changing the ratio i=
s definitely a reasonable thing to do. =A0The capacities come from the=A0ma=
pred.tasktracker.map.tasks.maximum and=A0mapred.tasktracker.reduce.tasks.ma=
ximum tasktracker configurations. =A0You can tweak these on your nodes to g=
et your desired ratio. =A0=A0</div>
<div style><br></div><div style>-Sandy</div></div><div class=3D"gmail_extra=
"><br><br><div class=3D"gmail_quote">On Mon, Sep 30, 2013 at 12:39 PM, Hima=
nshu Vijay <span dir=3D"ltr">&lt;<a href=3D"mailto:himanshuvj@gmail.com" ta=
rget=3D"_blank">himanshuvj@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi,<div><br></div><div>Our =
Hadoop cluster is running 0.20.203. The cluster currently has &#39;Map Task=
 Capacity&#39; of 8900+ &#39;Reduce Task Capacity&#39; of 3300+ resulting i=
n a ratio of 2.7. We have a lot of variety of jobs running and we want to i=
ncrease the throughput.=A0</div>


<div><br></div><div>My manual observation was that we hit the Mapper capaci=
ty and hence many jobs have to wait even though lot of room left in Reduce =
capacity. I mined the jobtracker logs for the jobs that completed and saw t=
hat on a hourly basis as well as daily basis the mapper:reducer ratio was 4=
-5.=A0</div>


<div><br></div><div>To increase the throughput I was thinking that I experi=
ment changing the Map and Reducer Task Capacity such that the ratio is incr=
eased from 2.7 to ~4.=A0</div><div><br></div><div>Does this sound like a co=
rrect approach ? Is this something that I can control or it&#39;s determine=
d automatically by Hadoop ?<br>


</div><div><br></div><div>Have any of you done this kind of exercise ? If y=
es can you please direct how to go about changing this ratio. I am not find=
ing much literature on it.=A0</div><div><br></div><div>Note: Mapper and Red=
ucerTask Capacity is the max total no. of mappers/reducers you can run on t=
he cluster at any point.</div>


<div><div><br></div>Regards,<br>-Himanshu Vijay
</div></div>
</blockquote></div><br></div>

--047d7b6d8818c4731004e79f2c78--