Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates
 216.139.236.158 as permitted sender)
Message-ID: <25008761.post@talk.nabble.com>
Date: Mon, 17 Aug 2009 08:22:19 -0700 (PDT)
From: tigertail <tyczjs@yahoo.com>
To: core-user@hadoop.apache.org
Subject: Percentage calculation?
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Hi Hadoop/MapReduce experts,

My question might be naive, But I am really stuck here and I am looking
forward to get helps/advises from you.

I have an input file like
key1, 2
key2, 1
key1, 1
key3, 1

It is easy to write a M/R code to calculate the count for each key and
output sth like
key1, 3
key2, 1
key3, 1

But, how I can calculate the percentage of each key over all keys, with the
above input, I would expect to get the output as
key1, 0.60
key2, 0.20
key3, 0.20

One naive method is to calculate the total count (5 with the above input)
which is saved in a file. Then the file is read in before M/R starts. But it
is obviously ugly and slow. 

I also tried to set a static enum Counters { INPUT_WORDS }
In mapper I do context.getCounter(Counters.INPUT_WORDS).increment(1);
In reducer I do context.getCounter(Counters.INPUT_WORDS).getCounter();
But it does not work.

Is there more elegant way?
-- 
View this message in context: http://www.nabble.com/Percentage-calculation--tp25008761p25008761.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.