Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 58895EA79 for ; Wed, 21 Nov 2012 16:43:37 +0000 (UTC) Received: (qmail 8828 invoked by uid 500); 21 Nov 2012 16:43:32 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 8459 invoked by uid 500); 21 Nov 2012 16:43:31 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 7898 invoked by uid 99); 21 Nov 2012 16:43:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Nov 2012 16:43:30 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [206.47.135.205] (HELO Spam1.prd.mpac.ca) (206.47.135.205) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Nov 2012 16:43:23 +0000 Received: from Spam1.prd.mpac.ca (unknown [127.0.0.1]) by IMSVA80 (Postfix) with ESMTP id C74121D806A for ; Wed, 21 Nov 2012 11:43:01 -0500 (EST) Received: from SMAIL1.prd.mpac.ca (unknown [172.29.2.53]) by Spam1.prd.mpac.ca (Postfix) with ESMTP id 7C2791D8067 for ; Wed, 21 Nov 2012 11:43:01 -0500 (EST) Received: from SMAIL1.prd.mpac.ca ([fe80::d548:4221:967c:4cfb]) by SMAIL1.prd.mpac.ca ([fe80::18cb:8648:b77f:2b55%11]) with mapi id 14.02.0318.004; Wed, 21 Nov 2012 11:43:01 -0500 From: "Kartashov, Andy" To: "user@hadoop.apache.org" Subject: RE: guessing number of reducers. Thread-Topic: guessing number of reducers. Thread-Index: AQHNyAa+eFvdilSzZ0GMrlMJ4OVYwpf0fZZA Date: Wed, 21 Nov 2012 16:43:00 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.29.60.102] Content-Type: multipart/alternative; boundary="_000_BD42F346AE90F544A731516A805D1B8AD87360SMAIL1prdmpacca_" MIME-Version: 1.0 X-TM-AS-Product-Ver: IMSVA-8.0.0.1304-6.5.0.1024-19384.000 X-TM-AS-Result: No--27.616-5.0-31-10 X-imss-scan-details: No--27.616-5.0-31-10 X-TM-AS-Result-Xfilter: Match text exemption rules:No X-TMASE-MatchedRID: ZFzIhWOuIzvZfnct5UBzcQPZZctd3P4B1kqyrcMalqWX1RWcrwojHMui qDMXBzmJEtcT5wFC8tgylGm41NUJkD7WmXgjC/Z/kmtbTcNpxYS+F//Mn3a2wwuJogx0OOb5cxK qycr2535/fGEXUPuXTAM/E7YAeq9x+dVjQNaxOrcNEGOVZ0MgDVPgO2JKQydYWltirZ/iPP58xL YXQwNEJgaEu0Q7hBPEn20qzlazITQ5GZa4xCpu2+YAh37ZsBDCfeQy5LfnzhVVaiL9Jo7jBgn9o fQNoBQFcce8zcwjPUkd28n/9SVHweOuRUn9B+2TMEK/6tg6I3EGF+E6mzeNqAeLCIX046iBjNLx rcxKViUihgJVrNniBWrEBfVjKK6u1lfDCm9+EZVYKMMlFh4BnVqvZZ9/gpIhZ5yuplze9pvS0Mb WfFpgAP/55Kkc+9/6c91xMYNqHkXzOyfz6pzW7t1raCrpRXVqmRKFhwukYf0Ev5X2UPfz0Y/s/C ZkTXz5o0lKUSlsa4yXVoiCvbmRFLmvYRzd93doQs0ueGZxVcPKIGMaZvT02zssXelfet1UuPFrk RUFXWrgT2zXYa9/nbnE2ijtyO+/E9InQ6AzaocJzzPLkfr7JjiM+RA3fpdfYhU/R6kTQTaTJvjH lclOpPyYXyY9M97C X-Virus-Checked: Checked by ClamAV on apache.org --_000_BD42F346AE90F544A731516A805D1B8AD87360SMAIL1prdmpacca_ Content-Type: text/plain; charset="us-ascii" Jamal, This is what I am using... After you start your job, visit jobtracker's WebUI :50030 And look for Cluster summary. Reduce Task Capacity shall hint you what optimally set your number to. I could be wrong but it works for me. :) Cluster Summary (Heap Size is *** MB/966.69 MB) Running Map Tasks Running Reduce Tasks Total Submissions Nodes Occupied Map Slots Occupied Reduce Slots Reserved Map Slots Reserved Reduce Slots Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes Excluded Nodes Rgds, AK47 From: jamal sasha [mailto:jamalshasha@gmail.com] Sent: Wednesday, November 21, 2012 11:39 AM To: user@hadoop.apache.org Subject: guessing number of reducers. By default the number of reducers is set to 1.. Is there a good way to guess optimal number of reducers.... Or let's say i have tbs worth of data... mappers are of order 5000 or so... But ultimately i am calculating , let's say, some average of whole data... say average transaction occurring... Now the output will be just one line in one "part"... rest of them will be empty.So i am guessing i need loads of reducers but then most of them will be empty but at the same time one reducer won't suffice.. What's the best way to solve this.. How to guess optimal number of reducers.. Thanks NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel --_000_BD42F346AE90F544A731516A805D1B8AD87360SMAIL1prdmpacca_ Content-Type: text/html; charset="us-ascii"

Jamal,

This is what I am using…

After you start your job, visit jobtracker’s WebUI <ip-address>:50030

And look for Cluster summary. Reduce Task Capacity shall hint you what optimally set your number to. I could be wrong but it works for me. J

Cluster Summary (Heap Size is *** MB/966.69 MB)

Running Map Tasks

Running Reduce Tasks

Total Submissions

Nodes

Occupied Map Slots

Occupied Reduce Slots

Reserved Map Slots

Reserved Reduce Slots

Map Task Capacity

Reduce Task Capacity

Avg. Tasks/Node

Blacklisted Nodes

Excluded Nodes

Rgds,

AK47

From: jamal sasha [mailto:jamalshasha@gmail.com]
Sent: Wednesday, November 21, 2012 11:39 AM
To: user@hadoop.apache.org
Subject: guessing number of reducers.

By default the number of reducers is set to 1..
Is there a good way to guess optimal number of reducers....
Or let's say i have tbs worth of data... mappers are of order 5000 or so...
But ultimately i am calculating , let's say, some average of whole data... say average transaction occurring...
Now the output will be just one line in one "part"... rest of them will be empty.So i am guessing i need loads of reducers but then most of them will be empty but at the same time one reducer won't suffice..
What's the best way to solve this..
How to guess optimal number of reducers..
Thanks

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel --_000_BD42F346AE90F544A731516A805D1B8AD87360SMAIL1prdmpacca_--