Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of nitinpawar432@gmail.com
 designates 209.85.215.46 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CA+ndhHpKznysG7zM84MUye3e2kYZ018p9enU3D1YBKw8y3jjDA@mail.gmail.com>
References: 
 <CA+ndhHoVGbvX7-nfr3JPzpPOWROBvRSSZ5r2Qdx3pDKqyOiWBQ@mail.gmail.com>
	<CAORpBsgaJBXcXPXLjpxg2+iRFsZJx_DpHJO5J4hL3mOguF588g@mail.gmail.com>
	<CA+ndhHqDwTzr5qYooHSTo3ujT-X4Y+CN1KuETFrsGSgyYdsrvQ@mail.gmail.com>
	<CAORpBsjYyUMzmLYKhS74+rG5FKfrBLaS8RbMNKcizh+HfCNYxw@mail.gmail.com>
	<CA+ndhHoZjqVt4R0DC29qGG+PqSbVYHjW87AYfzmuBAF-gEmdgA@mail.gmail.com>
	<CAORpBsgGWjLsJ=ArkGv8hihq3DEgEat0N+tr16EEdak_cxT_NA@mail.gmail.com>
	<CA+ndhHpKznysG7zM84MUye3e2kYZ018p9enU3D1YBKw8y3jjDA@mail.gmail.com>
Date: Tue, 30 Apr 2013 16:15:40 +0530
Message-ID: 
 <CAORpBsgoizw24J48v7cQVSt8HRbEENz-JoEFYJ8dfDhnHinBMQ@mail.gmail.com>
Subject: Re: Set reducer capacity for a specific M/R job
From: Nitin Pawar <nitinpawar432@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a11c37c44d3020604db91b30b

--001a11c37c44d3020604db91b30b
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I don't think you can control how many reducers can run parallely via
framework.

Other way to do this is increase the memory given to individual reducer so
that the tasktracker will be limited by memory to launch more reducers at
the same time and they will queue up

you can try setting up this mapred.job.reduce.memory.mb to a higher value
and see if that works


On Tue, Apr 30, 2013 at 4:08 PM, Han JU <ju.han.felix@gmail.com> wrote:

> Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.max=
imum
> is 8.
> And for this job, I want it to be 4.
> I set it through conf and build the job with this conf, then submit it.
> But hadoop lauches 8 reduce per datanode...
>
>
> 2013/4/30 Nitin Pawar <nitinpawar432@gmail.com>
>
>> so basically if I understand correctly
>>
>> you want to limit the # parallel execution of reducers only for this job=
?
>>
>>
>>
>> On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju.han.felix@gmail.com> wrote:
>>
>>> Thanks.
>>>
>>> In fact I don't want to set reducer or mapper numbers, they are fine.
>>> I want to set the reduce slot capacity of my cluster when it executes m=
y
>>> specific job. Say I have 100 reduce tasks for this job, I want my clust=
er
>>> to execute 4 of them in the same time, not 8 of them in the same time, =
only
>>> for this specific job.
>>> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the
>>> job. This conf is well received by the job, but ignored by hadoop ..
>>>
>>> Any idea why is this?
>>>
>>>
>>> 2013/4/30 Nitin Pawar <nitinpawar432@gmail.com>
>>>
>>>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets
>>>> the maximum number of reduce tasks that may be run by an individual
>>>> TaskTracker server at one time. This is not per job configuration.
>>>>
>>>> he number of map tasks for a given job is driven by the number of inpu=
t
>>>> splits and not by the mapred.map.tasks parameter. For each input split=
 a
>>>> map task is spawned. So, over the lifetime of a mapreduce job the numb=
er of
>>>> map tasks is equal to the number of input splits. mapred.map.tasks is =
just
>>>> a hint to the InputFormat for the number of maps
>>>>
>>>> If you want to set max number of maps or reducers per job then you can
>>>> set the hints by using the job object you created
>>>> job.setNumMapTasks()
>>>>
>>>> Note this is just a hint and again the number will be decided by the
>>>> input split size.
>>>>
>>>>
>>>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju.han.felix@gmail.com> wrote=
:
>>>>
>>>>> Thanks Nitin.
>>>>>
>>>>> What I need is to set slot only for a specific job, not for the whole
>>>>> cluster conf.
>>>>> But what I did does NOT work ... Have I done something wrong?
>>>>>
>>>>>
>>>>> 2013/4/30 Nitin Pawar <nitinpawar432@gmail.com>
>>>>>
>>>>>> The config you are setting is for job only
>>>>>>
>>>>>> But if you want to reduce the slota on tasktrackers then you will
>>>>>> need to edit tasktracker conf and restart tasktracker
>>>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju.han.felix@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I want to change the cluster's capacity of reduce slots on a per jo=
b
>>>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>>>> I did:
>>>>>>>
>>>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>>>> ...
>>>>>>> Job job =3D new Job(conf, ...)
>>>>>>>
>>>>>>>
>>>>>>> And in the web UI I can see that for this job, the max reduce tasks
>>>>>>> is exactly at 4, like I set. However hadoop still launches 8 reduce=
r per
>>>>>>> datanode ... why is this?
>>>>>>>
>>>>>>> How could I achieve this?
>>>>>>> --
>>>>>>> *JU Han*
>>>>>>>
>>>>>>> Software Engineer Intern @ KXEN Inc.
>>>>>>> UTC   -  Universit=E9 de Technologie de Compi=E8gne
>>>>>>> *     **GI06 - Fouille de Donn=E9es et D=E9cisionnel*
>>>>>>>
>>>>>>> +33 0619608888
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *JU Han*
>>>>>
>>>>> Software Engineer Intern @ KXEN Inc.
>>>>> UTC   -  Universit=E9 de Technologie de Compi=E8gne
>>>>> *     **GI06 - Fouille de Donn=E9es et D=E9cisionnel*
>>>>>
>>>>> +33 0619608888
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>>
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Universit=E9 de Technologie de Compi=E8gne
>>> *     **GI06 - Fouille de Donn=E9es et D=E9cisionnel*
>>>
>>> +33 0619608888
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Universit=E9 de Technologie de Compi=E8gne
> *     **GI06 - Fouille de Donn=E9es et D=E9cisionnel*
>
> +33 0619608888
>


--=20
Nitin Pawar

--001a11c37c44d3020604db91b30b
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><font color=3D"#444444">I don&#39;t think you can control =
how many reducers can run parallely via framework.=A0</font><div><font colo=
r=3D"#444444"><br></font></div><div style><font color=3D"#444444">Other way=
 to do this is increase the memory given to individual reducer so that the =
tasktracker will be limited by memory to launch more reducers at the same t=
ime and they will queue up=A0</font></div>
<div style><font color=3D"#444444"><br></font></div><div style><font color=
=3D"#444444">you can try setting up this=A0<span style=3D"background-color:=
transparent;font-family:Consolas,Menlo,Monaco,&#39;Lucida Console&#39;,&#39=
;Liberation Mono&#39;,&#39;DejaVu Sans Mono&#39;,&#39;Bitstream Vera Sans M=
ono&#39;,&#39;Courier New&#39;,monospace,serif;font-size:14px;line-height:1=
8px">mapred.job.reduce.memory.mb to a higher value and see if that works</s=
pan></font></div>
<div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Tue, Apr 3=
0, 2013 at 4:08 PM, Han JU <span dir=3D"ltr">&lt;<a href=3D"mailto:ju.han.f=
elix@gmail.com" target=3D"_blank">ju.han.felix@gmail.com</a>&gt;</span> wro=
te:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Yes.. In the conf file of m=
y cluster,=A0<span style=3D"font-family:arial,sans-serif;font-size:14px">ma=
pred.tasktracker.reduce.</span><span style=3D"font-family:arial,sans-serif;=
font-size:14px">tasks.maximum is 8.</span><div>

<span style=3D"font-family:arial,sans-serif;font-size:14px">And for this jo=
b, I want it to be 4.</span></div><div><font face=3D"arial, sans-serif"><sp=
an style=3D"font-size:14px">I set it=A0through=A0conf and build the job wit=
h this conf, then submit it. But hadoop lauches 8 reduce per datanode...</s=
pan></font></div>

</div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><b=
r><br><div class=3D"gmail_quote">2013/4/30 Nitin Pawar <span dir=3D"ltr">&l=
t;<a href=3D"mailto:nitinpawar432@gmail.com" target=3D"_blank">nitinpawar43=
2@gmail.com</a>&gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div dir=3D"ltr">so basically if I understand correctly=A0<div><br></div><d=
iv>you want to limit the # parallel execution of reducers only for this job=
?=A0</div><div><br></div></div><div class=3D"gmail_extra"><div><div>
<br><br><div class=3D"gmail_quote">
On Tue, Apr 30, 2013 at 4:02 PM, Han JU <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:ju.han.felix@gmail.com" target=3D"_blank">ju.han.felix@gmail.com</a>&g=
t;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0=
 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div dir=3D"ltr"><div>Thanks.</div><div><br></div>In fact I don&#39;t want =
to set reducer or mapper numbers, they are fine.<div>I want to set the redu=
ce slot capacity of my cluster when it executes my specific job. Say I have=
 100 reduce tasks for this job, I want my cluster to execute 4 of them in t=
he same time, not 8 of them in the same time, only for this specific job.</=
div>


<div>So I set=A0mapred.tasktracker.reduce.tasks.maximum to 4 and submit the=
 job. This conf is well received by the job, but ignored by hadoop ..</div>=
<div><br></div><div>Any idea why is this?</div></div><div>
<div><div class=3D"gmail_extra">
<br><br><div class=3D"gmail_quote">2013/4/30 Nitin Pawar <span dir=3D"ltr">=
&lt;<a href=3D"mailto:nitinpawar432@gmail.com" target=3D"_blank">nitinpawar=
432@gmail.com</a>&gt;</span><br><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div dir=3D"ltr"><font face=3D"verdana, sans-serif"><span style=3D"color:rg=
b(68,68,68);line-height:16px">The=A0</span><em style=3D"font-weight:bold;fo=
nt-style:normal;color:rgb(68,68,68);line-height:16px">mapred</em><span styl=
e=3D"color:rgb(68,68,68);line-height:16px">.</span><em style=3D"font-weight=
:bold;font-style:normal;color:rgb(68,68,68);line-height:16px">tasktracker</=
em><span style=3D"color:rgb(68,68,68);line-height:16px">.</span><em style=
=3D"font-weight:bold;font-style:normal;color:rgb(68,68,68);line-height:16px=
">reduce</em><span style=3D"color:rgb(68,68,68);line-height:16px">.</span><=
em style=3D"font-weight:bold;font-style:normal;color:rgb(68,68,68);line-hei=
ght:16px">tasks</em><span style=3D"color:rgb(68,68,68);line-height:16px">.<=
/span><em style=3D"font-weight:bold;font-style:normal;color:rgb(68,68,68);l=
ine-height:16px">maximum</em><span style=3D"color:rgb(68,68,68);line-height=
:16px">=A0parameter sets the maximum number of reduce tasks that may be run=
 by an individual TaskTracker server at one time. This is not per job confi=
guration.=A0</span><br>


</font><div><span style=3D"color:rgb(68,68,68);line-height:16px"><font face=
=3D"verdana, sans-serif"><br></font></span></div><div><font face=3D"verdana=
, sans-serif"><span style=3D"line-height:18px">he number of map tasks for a=
 given job is driven by the number of input splits and not by the mapred.ma=
p.tasks parameter. For each input split a map task is spawned. So, over the=
 lifetime of a mapreduce job the number of map tasks is equal to the number=
 of input splits. mapred.map.tasks is just a hint to the InputFormat for th=
e number of maps</span><br>


</font></div><div><span style=3D"line-height:18px"><font face=3D"verdana, s=
ans-serif"><br></font></span></div><div><span style=3D"color:rgb(68,68,68);=
line-height:16px"><font face=3D"verdana, sans-serif">If you want to set max=
 number of maps or reducers per job then you can set the hints by using the=
 job object you created=A0</font></span></div>


<div><font face=3D"verdana, sans-serif"><span style=3D"color:rgb(68,68,68);=
line-height:16px">job.</span><span>setNumMapTasks()</span></font></div><div=
><font face=3D"verdana, sans-serif"><br>
</font></div><div><font color=3D"#000000" face=3D"verdana, sans-serif">Note=
 this is just a hint and again the number will be decided by the input spli=
t size.=A0</font></div></div><div class=3D"gmail_extra"><div><div><br>
<br><div class=3D"gmail_quote">
On Tue, Apr 30, 2013 at 3:39 PM, Han JU <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:ju.han.felix@gmail.com" target=3D"_blank">ju.han.felix@gmail.com</a>&g=
t;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0=
 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div dir=3D"ltr">Thanks Nitin.<div><br></div><div>What I need is to set slo=
t only for a specific job, not for the whole cluster conf.</div><div>But wh=
at I did does NOT work ... Have I done something wrong?</div></div><div>


<div>
<div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">2013/4/30 Nit=
in Pawar <span dir=3D"ltr">&lt;<a href=3D"mailto:nitinpawar432@gmail.com" t=
arget=3D"_blank">nitinpawar432@gmail.com</a>&gt;</span><br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex">


<p dir=3D"ltr">The config you are setting is for job only</p>
<p dir=3D"ltr">But if you want to reduce the slota on tasktrackers then you=
 will need to edit tasktracker conf and restart tasktracker</p><div><div>
<div class=3D"gmail_quote">On Apr 30, 2013 3:30 PM, &quot;Han JU&quot; &lt;=
<a href=3D"mailto:ju.han.felix@gmail.com" target=3D"_blank">ju.han.felix@gm=
ail.com</a>&gt; wrote:<br type=3D"attribution"><blockquote class=3D"gmail_q=
uote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1e=
x">


<div dir=3D"ltr">Hi,<div><br></div><div>I want to change the cluster&#39;s =
capacity of reduce slots on a per job basis. Originally I have 8 reduce slo=
ts for a tasktracker.</div><div>I did:</div><div><br></div><blockquote styl=
e=3D"margin:0px 0px 0px 40px;border:none;padding:0px">


conf.set(&quot;mapred.tasktracker.reduce.tasks.maximum&quot;, &quot;4&quot;=
);<br>...<br>Job job =3D new Job(conf, ...)</blockquote><div><br></div><div=
>And in the web UI I can see that for this job, the max reduce tasks is exa=
ctly at 4, like I set. However hadoop still=A0launches 8 reducer per datano=
de ... why is this?</div>


<div><br clear=3D"all"><div>How could I=A0achieve=A0this?</div>-- <br><div =
dir=3D"ltr"><font face=3D"verdana, sans-serif"><b>JU Han</b></font><div><br=
></div><div><div><div><div><div><div><font face=3D"verdana, sans-serif">Sof=
tware Engineer Intern @ KXEN Inc.</font></div>


</div><div><div><div><span style=3D"font-size:13px">UTC=A0=A0 - =A0<font fa=
ce=3D"verdana, sans-serif">Universit=E9 de Technologie de Compi=E8gne</font=
></span></div></div></div><div><div><div><div><i>=A0=A0=A0=A0 </i><i><i sty=
le=3D"font-family:verdana,sans-serif">GI06 - Fouille de Donn=E9es et D=E9ci=
sionnel</i></i></div>


</div></div></div></div><div><br></div><div><div><font face=3D"verdana, san=
s-serif"><a href=3D"tel:%2B33%200619608888" value=3D"+33619608888" target=
=3D"_blank">+33 0619608888</a></font></div></div></div></div></div></div>
</div></div>
</blockquote></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div dir=3D"ltr"><font face=3D"verdana, sans-serif"><b>JU Han</b></font><di=
v><br></div><div><div><div><div><div><div><font face=3D"verdana, sans-serif=
">Software Engineer Intern @ KXEN Inc.</font></div>


</div><div><div><div><span style=3D"font-size:13px">UTC=A0=A0 - =A0<font fa=
ce=3D"verdana, sans-serif">Universit=E9 de Technologie de Compi=E8gne</font=
></span></div></div></div><div><div><div><div><i>=A0=A0=A0=A0 </i><i><i sty=
le=3D"font-family:verdana,sans-serif">GI06 - Fouille de Donn=E9es et D=E9ci=
sionnel</i></i></div>


</div></div></div></div><div><br></div><div><div><font face=3D"verdana, san=
s-serif"><a href=3D"tel:%2B33%200619608888" value=3D"+33619608888" target=
=3D"_blank">+33 0619608888</a></font></div></div></div></div></div></div>
</div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div></div><=
/div><span><font color=3D"#888888">-- <br>Nitin Pawar<br>
</font></span></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><div dir=3D"=
ltr"><font face=3D"verdana, sans-serif"><b>JU Han</b></font><div><br></div>=
<div><div><div><div><div><div><font face=3D"verdana, sans-serif">Software E=
ngineer Intern @ KXEN Inc.</font></div>


</div><div><div><div><span style=3D"font-size:13px">UTC=A0=A0 - =A0<font fa=
ce=3D"verdana, sans-serif">Universit=E9 de Technologie de Compi=E8gne</font=
></span></div></div></div><div><div><div><div><i>=A0=A0=A0=A0 </i><i><i sty=
le=3D"font-family:verdana,sans-serif">GI06 - Fouille de Donn=E9es et D=E9ci=
sionnel</i></i></div>


</div></div></div></div><div><br></div><div><div><font face=3D"verdana, san=
s-serif"><a href=3D"tel:%2B33%200619608888" value=3D"+33619608888" target=
=3D"_blank">+33 0619608888</a></font></div></div></div></div></div></div>
</div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div></div><=
/div><span><font color=3D"#888888">-- <br>Nitin Pawar<br>
</font></span></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><div dir=3D"=
ltr"><font face=3D"verdana, sans-serif"><b>JU Han</b></font><div><br></div>=
<div><div><div><div><div><div><font face=3D"verdana, sans-serif">Software E=
ngineer Intern @ KXEN Inc.</font></div>

</div><div><div><div><span style=3D"font-size:13px">UTC=A0=A0 - =A0<font fa=
ce=3D"verdana, sans-serif">Universit=E9 de Technologie de Compi=E8gne</font=
></span></div></div></div><div><div><div><div><i>=A0=A0=A0=A0 </i><i><i sty=
le=3D"font-family:verdana,sans-serif">GI06 - Fouille de Donn=E9es et D=E9ci=
sionnel</i></i></div>

</div></div></div></div><div><br></div><div><div><font face=3D"verdana, san=
s-serif"><a href=3D"tel:%2B33%200619608888" value=3D"+33619608888" target=
=3D"_blank">+33 0619608888</a></font></div></div></div></div></div></div>
</div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
Nitin Pawar<br>
</div></div>

--001a11c37c44d3020604db91b30b--