Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of dontariq@gmail.com designates
 209.85.216.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <BD42F346AE90F544A731516A805D1B8AD87399@SMAIL1.prd.mpac.ca>
References: 
 <CACb0Fn5+HTmYyvKgEoWMOe-Fz8-TeQ7dZiHw-0Oph9oqPHvfoA@mail.gmail.com>
 <1968115515-1353516633-cardhu_decombobulator_blackberry.rim.net-1299944419-@b27.c16.bise7.blackberry>
 <BD42F346AE90F544A731516A805D1B8AD87399@SMAIL1.prd.mpac.ca>
From: Mohammad Tariq <dontariq@gmail.com>
Date: Wed, 21 Nov 2012 23:34:40 +0530
Message-ID: 
 <CAMVC6RNySKzF9fzwwFqJqy6pYdTriZde7sJpQEKrsgf7umsyfQ@mail.gmail.com>
Subject: Re: guessing number of reducers.
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=20cf302ef99ca875e604cf053195

--20cf302ef99ca875e604cf053195
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hello Jamal,

   I use a different approach based on the no of cores. If you have, say a
4 cores machine then you can have (0.75*no cores)no.  of MR slots.
For example, if you have 4 physical cores OR 8 virtual cores then you can
have 0.75*8=3D6 MR slots. You can then set 3M+3R or 4M+2R and so on as per
your requirement.

Regards,
    Mohammad Tariq


On Wed, Nov 21, 2012 at 11:19 PM, Kartashov, Andy <Andy.Kartashov@mpac.ca>w=
rote:

>  Bejoy,
>
>
>
> I=92ve read somethere about keeping number of mapred.reduce.tasks below t=
he
> reduce task capcity. Here is what I just tested:
>
>
>
> Output 25Gb. 8DN cluster with 16 Map and Reduce Task Capacity:
>
>
>
> 1 Reducer   =96 22mins
>
> 4 Reducers =96 11.5mins
>
> 8 Reducers =96 5mins
>
> 10 Reducers =96 7mins
>
> 12 Reducers =96 6:5mins
>
> 16 Reducers =96 5.5mins
>
>
>
> 8 Reducers have won the race. But Reducers at the max capacity was very
> clos. J
>
>
>
> AK47
>
>
>
>
>
> *From:* Bejoy KS [mailto:bejoy.hadoop@gmail.com]
> *Sent:* Wednesday, November 21, 2012 11:51 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: guessing number of reducers.
>
>
>
> Hi Sasha
>
> In general the number of reduce tasks is chosen mainly based on the data
> volume to reduce phase. In tools like hive and pig by default for every 1=
GB
> of map output there will be a reducer. So if you have 100 gigs of map
> output then 100 reducers.
> If your tasks are more CPU intensive then you need lesser volume of data
> per reducer for better performance results.
>
> In general it is better to have the number of reduce tasks slightly less
> than the number of available reduce slots in the cluster.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>  ------------------------------
>
> *From: *jamal sasha <jamalshasha@gmail.com>
>
> *Date: *Wed, 21 Nov 2012 11:38:38 -0500
>
> *To: *user@hadoop.apache.org<user@hadoop.apache.org>
>
> *ReplyTo: *user@hadoop.apache.org
>
> *Subject: *guessing number of reducers.
>
>
>
> By default the number of reducers is set to 1..
> Is there a good way to guess optimal number of reducers....
> Or let's say i have tbs worth of data... mappers are of order 5000 or so.=
..
> But ultimately i am calculating , let's say, some average of whole data..=
.
> say average transaction occurring...
> Now the output will be just one line in one "part"... rest of them will b=
e
> empty.So i am guessing i need loads of reducers but then most of them wil=
l
> be empty but at the same time one reducer won't suffice..
> What's the best way to solve this..
> How to guess optimal number of reducers..
> Thanks
>  NOTICE: This e-mail message and any attachments are confidential, subjec=
t
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environmen=
t
> before printing this e-mail. AVIS : le pr=E9sent courriel et toute pi=E8c=
e
> jointe qui l'accompagne sont confidentiels, prot=E9g=E9s par le droit d'a=
uteur
> et peuvent =EAtre couverts par le secret professionnel. Toute utilisation=
,
> copie ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas =
le
> destinataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diat=
ement
> l'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'imprimer le p=
r=E9sent
> courriel
>

--20cf302ef99ca875e604cf053195
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hello Jamal,<div><br></div><div>=A0 =A0I use a different approach based on =
the no of cores. If you have, say a 4 cores machine then you can have (0.75=
*no cores)no. =A0of MR slots.=A0</div><div>For example, if you have 4 physi=
cal cores OR 8 virtual cores then you can have 0.75*8=3D6 MR slots. You can=
 then set 3M+3R or 4M+2R and so on as per your requirement.</div>

<div class=3D"gmail_extra"><br clear=3D"all">Regards,<div>=A0=A0 =A0Mohamma=
d Tariq</div><br>
<br><br><div class=3D"gmail_quote">On Wed, Nov 21, 2012 at 11:19 PM, Kartas=
hov, Andy <span dir=3D"ltr">&lt;<a href=3D"mailto:Andy.Kartashov@mpac.ca" t=
arget=3D"_blank">Andy.Kartashov@mpac.ca</a>&gt;</span> wrote:<br><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex">


<div lang=3D"EN-CA" link=3D"blue" vlink=3D"purple">
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Bejoy,</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">I=92ve read somethere abou=
t keeping number of mapred.reduce.tasks below the reduce task capcity. Here=
 is what I just tested:</span></p>


<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Output 25Gb. 8DN cluster w=
ith 16 Map and Reduce Task Capacity:</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">1 Reducer =A0=A0=96 22mins=
</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">4 Reducers =96 11.5mins</s=
pan></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">8 Reducers =96 5mins
</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">10 Reducers =96 7mins</spa=
n></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">12 Reducers =96 6:5mins</s=
pan></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">16 Reducers =96 5.5mins</s=
pan></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">8 Reducers have won the ra=
ce. But Reducers at the max capacity was very clos.
</span><span style=3D"font-size:9.0pt;font-family:Wingdings;color:#1f497d">=
J</span><span style=3D"font-size:9.0pt;font-family:&quot;Calibri&quot;,&quo=
t;sans-serif&quot;;color:#1f497d"></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">AK47</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt;font-family:&quot;Cal=
ibri&quot;,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p>
<div>
<div style=3D"border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in =
0in 0in">
<p class=3D"MsoNormal"><b><span lang=3D"EN-US" style=3D"font-size:10.0pt;fo=
nt-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span =
lang=3D"EN-US" style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&qu=
ot;sans-serif&quot;"> Bejoy KS [mailto:<a href=3D"mailto:bejoy.hadoop@gmail=
.com" target=3D"_blank">bejoy.hadoop@gmail.com</a>]
<br>
<b>Sent:</b> Wednesday, November 21, 2012 11:51 AM<br>
<b>To:</b> <a href=3D"mailto:user@hadoop.apache.org" target=3D"_blank">user=
@hadoop.apache.org</a><br>
<b>Subject:</b> Re: guessing number of reducers.</span></p>
</div>
</div><div><div>
<p class=3D"MsoNormal">=A0</p>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">Hi Sasha<br>
<br>
In general the number of reduce tasks is chosen mainly based on the data vo=
lume to reduce phase. In tools like hive and pig by default for every 1GB o=
f map output there will be a reducer. So if you have 100 gigs of map output=
 then 100 reducers.<br>


If your tasks are more CPU intensive then you need lesser volume of data pe=
r reducer for better performance results.
<br>
<br>
In general it is better to have the number of reduce tasks slightly less th=
an the number of available reduce slots in the cluster.</p>
<div>
<p class=3D"MsoNormal">Regards<br>
Bejoy KS<br>
<br>
Sent from handheld, please excuse typos.</p>
</div>
<div class=3D"MsoNormal" align=3D"center" style=3D"text-align:center">
<hr size=3D"2" width=3D"100%" align=3D"center">
</div>
<div>
<p class=3D"MsoNormal"><b>From: </b>jamal sasha &lt;<a href=3D"mailto:jamal=
shasha@gmail.com" target=3D"_blank">jamalshasha@gmail.com</a>&gt; </p>
</div>
<div>
<p class=3D"MsoNormal"><b>Date: </b>Wed, 21 Nov 2012 11:38:38 -0500</p>
</div>
<div>
<p class=3D"MsoNormal"><b>To: </b><a href=3D"mailto:user@hadoop.apache.org"=
 target=3D"_blank">user@hadoop.apache.org</a>&lt;<a href=3D"mailto:user@had=
oop.apache.org" target=3D"_blank">user@hadoop.apache.org</a>&gt;</p>
</div>
<div>
<p class=3D"MsoNormal"><b>ReplyTo: </b><a href=3D"mailto:user@hadoop.apache=
.org" target=3D"_blank">user@hadoop.apache.org</a> </p>
</div>
<div>
<p class=3D"MsoNormal"><b>Subject: </b>guessing number of reducers.</p>
</div>
<div>
<p class=3D"MsoNormal">=A0</p>
</div>
<p class=3D"MsoNormal">By default the number of reducers is set to 1..<br>
Is there a good way to guess optimal number of reducers....<br>
Or let&#39;s say i have tbs worth of data... mappers are of order 5000 or s=
o...<br>
But ultimately i am calculating , let&#39;s say, some average of whole data=
... say average transaction occurring...<br>
Now the output will be just one line in one &quot;part&quot;... rest of the=
m will be empty.So i am guessing i need loads of reducers but then most of =
them will be empty but at the same time one reducer won&#39;t suffice..
<br>
What&#39;s the best way to solve this..<br>
How to guess optimal number of reducers..<br>
Thanks </p>
</div></div></div>
NOTICE: This e-mail message and any attachments are confidential, subject t=
o copyright and may be privileged. Any unauthorized use, copying or disclos=
ure is prohibited. If you are not the intended recipient, please delete and=
 contact the sender immediately.
 Please consider the environment before printing this e-mail. AVIS : le pr=
=E9sent courriel et toute pi=E8ce jointe qui l&#39;accompagne sont confiden=
tiels, prot=E9g=E9s par le droit d&#39;auteur et peuvent =EAtre couverts pa=
r le secret professionnel. Toute utilisation, copie
 ou divulgation non autoris=E9e est interdite. Si vous n&#39;=EAtes pas le =
destinataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diatem=
ent l&#39;exp=E9diteur. Veuillez penser =E0 l&#39;environnement avant d&#39=
;imprimer le pr=E9sent courriel
</div>

</blockquote></div><br></div>

--20cf302ef99ca875e604cf053195--