Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 63748 invoked from network); 17 Aug 2009 15:22:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Aug 2009 15:22:46 -0000 Received: (qmail 47223 invoked by uid 500); 17 Aug 2009 15:22:50 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 47161 invoked by uid 500); 17 Aug 2009 15:22:50 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 47151 invoked by uid 500); 17 Aug 2009 15:22:50 -0000 Delivered-To: apmail-hadoop-core-user@hadoop.apache.org Received: (qmail 47147 invoked by uid 99); 17 Aug 2009 15:22:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Aug 2009 15:22:49 +0000 X-ASF-Spam-Status: No, hits=1.4 required=10.0 tests=FORGED_YAHOO_RCVD,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Aug 2009 15:22:40 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1Md42N-0002hy-PY for core-user@hadoop.apache.org; Mon, 17 Aug 2009 08:22:19 -0700 Message-ID: <25008761.post@talk.nabble.com> Date: Mon, 17 Aug 2009 08:22:19 -0700 (PDT) From: tigertail To: core-user@hadoop.apache.org Subject: Percentage calculation? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: tyczjs@yahoo.com X-Virus-Checked: Checked by ClamAV on apache.org Hi Hadoop/MapReduce experts, My question might be naive, But I am really stuck here and I am looking forward to get helps/advises from you. I have an input file like key1, 2 key2, 1 key1, 1 key3, 1 It is easy to write a M/R code to calculate the count for each key and output sth like key1, 3 key2, 1 key3, 1 But, how I can calculate the percentage of each key over all keys, with the above input, I would expect to get the output as key1, 0.60 key2, 0.20 key3, 0.20 One naive method is to calculate the total count (5 with the above input) which is saved in a file. Then the file is read in before M/R starts. But it is obviously ugly and slow. I also tried to set a static enum Counters { INPUT_WORDS } In mapper I do context.getCounter(Counters.INPUT_WORDS).increment(1); In reducer I do context.getCounter(Counters.INPUT_WORDS).getCounter(); But it does not work. Is there more elegant way? -- View this message in context: http://www.nabble.com/Percentage-calculation--tp25008761p25008761.html Sent from the Hadoop core-user mailing list archive at Nabble.com.