Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5C401DD59 for ; Wed, 21 Nov 2012 17:50:47 +0000 (UTC) Received: (qmail 80491 invoked by uid 500); 21 Nov 2012 17:50:42 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 80399 invoked by uid 500); 21 Nov 2012 17:50:42 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 80391 invoked by uid 99); 21 Nov 2012 17:50:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Nov 2012 17:50:42 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [206.47.135.205] (HELO Spam1.prd.mpac.ca) (206.47.135.205) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Nov 2012 17:50:33 +0000 Received: from Spam1.prd.mpac.ca (unknown [127.0.0.1]) by IMSVA80 (Postfix) with ESMTP id 2902C1D8063; Wed, 21 Nov 2012 12:50:11 -0500 (EST) Received: from SMAIL1.prd.mpac.ca (unknown [172.29.2.53]) by Spam1.prd.mpac.ca (Postfix) with ESMTP id C92CB1D8057; Wed, 21 Nov 2012 12:50:10 -0500 (EST) Received: from SMAIL1.prd.mpac.ca ([fe80::d548:4221:967c:4cfb]) by SMAIL1.prd.mpac.ca ([fe80::18cb:8648:b77f:2b55%11]) with mapi id 14.02.0318.004; Wed, 21 Nov 2012 12:49:50 -0500 From: "Kartashov, Andy" To: "user@hadoop.apache.org" , "bejoy.hadoop@gmail.com" Subject: RE: guessing number of reducers. Thread-Topic: guessing number of reducers. Thread-Index: AQHNyAa+eFvdilSzZ0GMrlMJ4OVYwpf01GcA//+7UDA= Date: Wed, 21 Nov 2012 17:49:50 +0000 Message-ID: References: <1968115515-1353516633-cardhu_decombobulator_blackberry.rim.net-1299944419-@b27.c16.bise7.blackberry> In-Reply-To: <1968115515-1353516633-cardhu_decombobulator_blackberry.rim.net-1299944419-@b27.c16.bise7.blackberry> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.29.60.102] Content-Type: multipart/alternative; boundary="_000_BD42F346AE90F544A731516A805D1B8AD87399SMAIL1prdmpacca_" MIME-Version: 1.0 X-TM-AS-Product-Ver: IMSVA-8.0.0.1304-6.5.0.1024-19384.001 X-TM-AS-Result: No--21.843-5.0-31-10 X-imss-scan-details: No--21.843-5.0-31-10 X-TM-AS-Result-Xfilter: Match text exemption rules:No X-TMASE-MatchedRID: m8xwBw8opbtDZFBd1jLr/hmCYUYerLHrXeyujb7aXgQaUFNX+UFygGqM dwbSsCOuYDf2ozmNg9oDPxO2AHqvcfnVY0DWsTq3x231x/0FQ6BT4DtiSkMnWFpbYq2f4jz+wB8 gzNQ+eE4u5KtAsou+iSuGIcHjKjwiKkJVLyy17g5ZjzGdsFw9/zoSfZud5+GgKgXL5dv/LKnK6a 9pErDOO/ttjrJDj/9fuJ/LnQVVghHmzYT8cOkbWaOONuzwygtGgERCb8Fo8oG1KWGoLeo86AJoc H/3+6RYD7IY+yWZ+YhQh+RStgxHk9ZKbj1LaMdSAM+6FTg/ncr9qRvaO2X/IQVcVKR9rPRx5fMC zNTXI2akRgEg7nrRyueJm6LClLmH52Vc1LpFcqRnq6caWcIOC/hs+N+bSEhB4pinC0b7AdXwWEa NOnlY997Yzt4TtX6drC9NCmw8EAuLMWEQIO6h8PilM7nponT8cLENoEk74i9tcG34oUhJOIahJ8 Qc8eksdvgGbsWpCctcefQrYJsh7GoP2Q08Uz0YNNHZMWDTEbfeR5dcF7n1ecO/l0Ny5PZ5wIZsl tUXDSPtvclewH3KwENDTZO6EXjOA8bHJPKwM/3kGAR1SqoA1GmRqNBHmBvelpyqxIUg/ZSqNQvW nQEgh3iQ0pU1oxRXPszcBg6LiYz4fyFnsryyF8jaRV6YHmOKCzh5j8VpEllLDBwYotNgRw== X-Virus-Checked: Checked by ClamAV on apache.org --_000_BD42F346AE90F544A731516A805D1B8AD87399SMAIL1prdmpacca_ Content-Type: text/plain; charset="us-ascii" Bejoy, I've read somethere about keeping number of mapred.reduce.tasks below the reduce task capcity. Here is what I just tested: Output 25Gb. 8DN cluster with 16 Map and Reduce Task Capacity: 1 Reducer - 22mins 4 Reducers - 11.5mins 8 Reducers - 5mins 10 Reducers - 7mins 12 Reducers - 6:5mins 16 Reducers - 5.5mins 8 Reducers have won the race. But Reducers at the max capacity was very clos. :) AK47 From: Bejoy KS [mailto:bejoy.hadoop@gmail.com] Sent: Wednesday, November 21, 2012 11:51 AM To: user@hadoop.apache.org Subject: Re: guessing number of reducers. Hi Sasha In general the number of reduce tasks is chosen mainly based on the data volume to reduce phase. In tools like hive and pig by default for every 1GB of map output there will be a reducer. So if you have 100 gigs of map output then 100 reducers. If your tasks are more CPU intensive then you need lesser volume of data per reducer for better performance results. In general it is better to have the number of reduce tasks slightly less than the number of available reduce slots in the cluster. Regards Bejoy KS Sent from handheld, please excuse typos. ________________________________ From: jamal sasha Date: Wed, 21 Nov 2012 11:38:38 -0500 To: user@hadoop.apache.org ReplyTo: user@hadoop.apache.org Subject: guessing number of reducers. By default the number of reducers is set to 1.. Is there a good way to guess optimal number of reducers.... Or let's say i have tbs worth of data... mappers are of order 5000 or so... But ultimately i am calculating , let's say, some average of whole data... say average transaction occurring... Now the output will be just one line in one "part"... rest of them will be empty.So i am guessing i need loads of reducers but then most of them will be empty but at the same time one reducer won't suffice.. What's the best way to solve this.. How to guess optimal number of reducers.. Thanks NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel --_000_BD42F346AE90F544A731516A805D1B8AD87399SMAIL1prdmpacca_ Content-Type: text/html; charset="us-ascii"

Bejoy,

 

I’ve read somethere about keeping number of mapred.reduce.tasks below the reduce task capcity. Here is what I just tested:

 

Output 25Gb. 8DN cluster with 16 Map and Reduce Task Capacity:

 

1 Reducer   – 22mins

4 Reducers – 11.5mins

8 Reducers – 5mins

10 Reducers – 7mins

12 Reducers – 6:5mins

16 Reducers – 5.5mins

 

8 Reducers have won the race. But Reducers at the max capacity was very clos. J

 

AK47

 

 

From: Bejoy KS [mailto:bejoy.hadoop@gmail.com]
Sent: Wednesday, November 21, 2012 11:51 AM
To: user@hadoop.apache.org
Subject: Re: guessing number of reducers.

 

Hi Sasha

In general the number of reduce tasks is chosen mainly based on the data volume to reduce phase. In tools like hive and pig by default for every 1GB of map output there will be a reducer. So if you have 100 gigs of map output then 100 reducers.
If your tasks are more CPU intensive then you need lesser volume of data per reducer for better performance results.

In general it is better to have the number of reduce tasks slightly less than the number of available reduce slots in the cluster.

Regards
Bejoy KS

Sent from handheld, please excuse typos.


From: jamal sasha <jamalshasha@gmail.com>

Date: Wed, 21 Nov 2012 11:38:38 -0500

To: user@hadoop.apache.org<user@hadoop.apache.org>

ReplyTo: user@hadoop.apache.org

Subject: guessing number of reducers.

 

By default the number of reducers is set to 1..
Is there a good way to guess optimal number of reducers....
Or let's say i have tbs worth of data... mappers are of order 5000 or so...
But ultimately i am calculating , let's say, some average of whole data... say average transaction occurring...
Now the output will be just one line in one "part"... rest of them will be empty.So i am guessing i need loads of reducers but then most of them will be empty but at the same time one reducer won't suffice..
What's the best way to solve this..
How to guess optimal number of reducers..
Thanks

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel --_000_BD42F346AE90F544A731516A805D1B8AD87399SMAIL1prdmpacca_--