Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 68168DE5B for ; Wed, 21 Nov 2012 18:05:54 +0000 (UTC) Received: (qmail 41080 invoked by uid 500); 21 Nov 2012 18:05:49 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 40899 invoked by uid 500); 21 Nov 2012 18:05:49 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 40891 invoked by uid 99); 21 Nov 2012 18:05:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Nov 2012 18:05:49 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dontariq@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Nov 2012 18:05:42 +0000 Received: by mail-qc0-f176.google.com with SMTP id n41so5689039qco.35 for ; Wed, 21 Nov 2012 10:05:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=zI5FpGX6n3aKpdlcV6VAfaGxey5j9swRCaKPrt35/wA=; b=vggw8CRv1yEBk4HjveXDUO7oOaxVJDAnnvdkv74DAbdHzQhYeojUcFe2+7JriE6sIO E7T6evrNvtCMGbGGkFVbXtMBV5JwfzkNrcLrYIzYEGL1ix1YzKs5tvASMOGawxE7NuZ1 Gt6g1A0/q2xiahNAD/DmspjiCdyUyuZkSW0VHXaWzGlUzTUqKkoH7UVby6/QVMRTjXBO uQzOpntx1FqNN7mO6wq5kgcnzjZUvCDLGlNnn4ejaRVSjFOeM4YNHBwJlUFlOBRFu8Vf AGD7YLUuIzcxygqm7WY9p2O1oIUlje1OIbyhZx+crpYaVPbQSEv78DwsK6098QTNvIf7 U0Lg== Received: by 10.224.185.79 with SMTP id cn15mr19083725qab.14.1353521121752; Wed, 21 Nov 2012 10:05:21 -0800 (PST) MIME-Version: 1.0 Received: by 10.229.183.84 with HTTP; Wed, 21 Nov 2012 10:04:40 -0800 (PST) In-Reply-To: References: <1968115515-1353516633-cardhu_decombobulator_blackberry.rim.net-1299944419-@b27.c16.bise7.blackberry> From: Mohammad Tariq Date: Wed, 21 Nov 2012 23:34:40 +0530 Message-ID: Subject: Re: guessing number of reducers. To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=20cf302ef99ca875e604cf053195 X-Virus-Checked: Checked by ClamAV on apache.org --20cf302ef99ca875e604cf053195 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hello Jamal, I use a different approach based on the no of cores. If you have, say a 4 cores machine then you can have (0.75*no cores)no. of MR slots. For example, if you have 4 physical cores OR 8 virtual cores then you can have 0.75*8=3D6 MR slots. You can then set 3M+3R or 4M+2R and so on as per your requirement. Regards, Mohammad Tariq On Wed, Nov 21, 2012 at 11:19 PM, Kartashov, Andy w= rote: > Bejoy, > > > > I=92ve read somethere about keeping number of mapred.reduce.tasks below t= he > reduce task capcity. Here is what I just tested: > > > > Output 25Gb. 8DN cluster with 16 Map and Reduce Task Capacity: > > > > 1 Reducer =96 22mins > > 4 Reducers =96 11.5mins > > 8 Reducers =96 5mins > > 10 Reducers =96 7mins > > 12 Reducers =96 6:5mins > > 16 Reducers =96 5.5mins > > > > 8 Reducers have won the race. But Reducers at the max capacity was very > clos. J > > > > AK47 > > > > > > *From:* Bejoy KS [mailto:bejoy.hadoop@gmail.com] > *Sent:* Wednesday, November 21, 2012 11:51 AM > *To:* user@hadoop.apache.org > *Subject:* Re: guessing number of reducers. > > > > Hi Sasha > > In general the number of reduce tasks is chosen mainly based on the data > volume to reduce phase. In tools like hive and pig by default for every 1= GB > of map output there will be a reducer. So if you have 100 gigs of map > output then 100 reducers. > If your tasks are more CPU intensive then you need lesser volume of data > per reducer for better performance results. > > In general it is better to have the number of reduce tasks slightly less > than the number of available reduce slots in the cluster. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > ------------------------------ > > *From: *jamal sasha > > *Date: *Wed, 21 Nov 2012 11:38:38 -0500 > > *To: *user@hadoop.apache.org > > *ReplyTo: *user@hadoop.apache.org > > *Subject: *guessing number of reducers. > > > > By default the number of reducers is set to 1.. > Is there a good way to guess optimal number of reducers.... > Or let's say i have tbs worth of data... mappers are of order 5000 or so.= .. > But ultimately i am calculating , let's say, some average of whole data..= . > say average transaction occurring... > Now the output will be just one line in one "part"... rest of them will b= e > empty.So i am guessing i need loads of reducers but then most of them wil= l > be empty but at the same time one reducer won't suffice.. > What's the best way to solve this.. > How to guess optimal number of reducers.. > Thanks > NOTICE: This e-mail message and any attachments are confidential, subjec= t > to copyright and may be privileged. Any unauthorized use, copying or > disclosure is prohibited. If you are not the intended recipient, please > delete and contact the sender immediately. Please consider the environmen= t > before printing this e-mail. AVIS : le pr=E9sent courriel et toute pi=E8c= e > jointe qui l'accompagne sont confidentiels, prot=E9g=E9s par le droit d'a= uteur > et peuvent =EAtre couverts par le secret professionnel. Toute utilisation= , > copie ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas = le > destinataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diat= ement > l'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'imprimer le p= r=E9sent > courriel > --20cf302ef99ca875e604cf053195 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hello Jamal,

=A0 =A0I use a different approach based on = the no of cores. If you have, say a 4 cores machine then you can have (0.75= *no cores)no. =A0of MR slots.=A0
For example, if you have 4 physi= cal cores OR 8 virtual cores then you can have 0.75*8=3D6 MR slots. You can= then set 3M+3R or 4M+2R and so on as per your requirement.

Regards,
=A0=A0 =A0Mohamma= d Tariq



On Wed, Nov 21, 2012 at 11:19 PM, Kartas= hov, Andy <Andy.Kartashov@mpac.ca> wrote:

Bejoy,

=A0

I=92ve read somethere abou= t keeping number of mapred.reduce.tasks below the reduce task capcity. Here= is what I just tested:

=A0

Output 25Gb. 8DN cluster w= ith 16 Map and Reduce Task Capacity:

=A0

1 Reducer =A0=A0=96 22mins=

4 Reducers =96 11.5mins

8 Reducers =96 5mins

10 Reducers =96 7mins

12 Reducers =96 6:5mins

16 Reducers =96 5.5mins

=A0

8 Reducers have won the ra= ce. But Reducers at the max capacity was very clos. = J

=A0

AK47

=A0

=A0

From: Bejoy KS [mailto:bejoy.hadoop@gmail.com]
Sent: Wednesday, November 21, 2012 11:51 AM
To: user= @hadoop.apache.org
Subject: Re: guessing number of reducers.

=A0

Hi Sasha

In general the number of reduce tasks is chosen mainly based on the data vo= lume to reduce phase. In tools like hive and pig by default for every 1GB o= f map output there will be a reducer. So if you have 100 gigs of map output= then 100 reducers.
If your tasks are more CPU intensive then you need lesser volume of data pe= r reducer for better performance results.

In general it is better to have the number of reduce tasks slightly less th= an the number of available reduce slots in the cluster.

Regards
Bejoy KS

Sent from handheld, please excuse typos.


From: jamal sasha <jamalshasha@gmail.com>

Date: Wed, 21 Nov 2012 11:38:38 -0500

Subject: guessing number of reducers.

=A0

By default the number of reducers is set to 1..
Is there a good way to guess optimal number of reducers....
Or let's say i have tbs worth of data... mappers are of order 5000 or s= o...
But ultimately i am calculating , let's say, some average of whole data= ... say average transaction occurring...
Now the output will be just one line in one "part"... rest of the= m will be empty.So i am guessing i need loads of reducers but then most of = them will be empty but at the same time one reducer won't suffice..
What's the best way to solve this..
How to guess optimal number of reducers..
Thanks

NOTICE: This e-mail message and any attachments are confidential, subject t= o copyright and may be privileged. Any unauthorized use, copying or disclos= ure is prohibited. If you are not the intended recipient, please delete and= contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr= =E9sent courriel et toute pi=E8ce jointe qui l'accompagne sont confiden= tiels, prot=E9g=E9s par le droit d'auteur et peuvent =EAtre couverts pa= r le secret professionnel. Toute utilisation, copie ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas le = destinataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diatem= ent l'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'= ;imprimer le pr=E9sent courriel

--20cf302ef99ca875e604cf053195--