Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 20079 invoked from network); 17 Aug 2009 22:08:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Aug 2009 22:08:52 -0000 Received: (qmail 75738 invoked by uid 500); 17 Aug 2009 21:41:09 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 75676 invoked by uid 500); 17 Aug 2009 21:41:08 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 75666 invoked by uid 99); 17 Aug 2009 21:41:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Aug 2009 21:41:08 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jingkei.ly@gmail.com designates 74.125.78.25 as permitted sender) Received: from [74.125.78.25] (HELO ey-out-2122.google.com) (74.125.78.25) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Aug 2009 21:41:00 +0000 Received: by ey-out-2122.google.com with SMTP id 22so695083eye.35 for ; Mon, 17 Aug 2009 14:40:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:from:date:x-google-sender-auth:message-id:subject:to :content-type; bh=NWGLiAEvgKRv2VnMnP+TISaKLrKGjhDascIfo8X8vPw=; b=kbW6K9gHazPraNge/zGbp9d7YIg4Apn8Xg4h6JDFKuycQVJ5xTEeN8eRb8mGDZIsKO 0oh2XcfezbFUnKdwrED8FF/F/q7PzcyMNNWJdO2pfqEEytCDHEn8o9JYy7tXtx05o0Tw esLmnQyFaouZBM8pRrTHsBZ2jTxrVmLLMMWQo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:content-type; b=FwYs5GpMWK44ZwdUyiD2UCKyIHjAX2x9BgXtN1YqEawG1dXSlb+rDIgDvh2JdRsiAC UqFFqgcVuVwnPaIn2LJ0woI/J7diWyABWUlpoHI0K2ZtXZPIwgO4dyqiqCd6wZsaBmef yf2i2Q0MqMKoTwuOiIvPi5GvGpfNfM6uRDrxc= MIME-Version: 1.0 Sender: jingkei.ly@gmail.com Received: by 10.210.140.11 with SMTP id n11mr3814171ebd.88.1250545238360; Mon, 17 Aug 2009 14:40:38 -0700 (PDT) In-Reply-To: <25013023.post@talk.nabble.com> References: <25008761.post@talk.nabble.com> <25013023.post@talk.nabble.com> From: Jingkei Ly Date: Mon, 17 Aug 2009 22:40:18 +0100 X-Google-Sender-Auth: 8e0169b3e6aee523 Message-ID: Subject: Re: Percentage calculation? To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0015174c3d84b543f104715d4014 X-Virus-Checked: Checked by ClamAV on apache.org --0015174c3d84b543f104715d4014 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit If the counter method doesn't work, I've used a slightly hacky way to do something like this in the past with the 0.19 API. In the Mapper I kept an instance variable keeping the count, and in the close() method I wrote out a file unique to each mapper task containing the final value of the instance variable. Then in the Reducers it would read in all the values and aggregate them together to give you the total count across all mappers. It relies on the fact that the Reducers don't start before all the Mappers have finished. i.e. in pseudo-code class Mapper { int inputWords = 0; map(key, value){ inputWords += value; } close() { // write out inputWords to a file unique to this mapper task } } class Reducer { int totalInputWords = 0; reduce() { if (firstTime) { for all inputWordFiles, f { int mapperInputWord = f.readInt(); totalInputWords += mapperInputWord; } } // use totalInputWords to calculate percentage } } Hope that makes sense. 2009/8/17 tigertail > > Can sb help please? I would expect there must be some easy way to do that. > > Some corrections, > In reducer I do context.getCounter(Counters.INPUT_WORDS).getValue(); > But it does not work. it always returns 0. > > > tigertail wrote: > > > > Hi Hadoop/MapReduce experts, > > > > My question might be naive, But I am really stuck here and I am looking > > forward to get helps/advises from you. > > > > I have an input file like > > key1, 2 > > key2, 1 > > key1, 1 > > key3, 1 > > > > It is easy to write a M/R code to calculate the count for each key and > > output sth like > > key1, 3 > > key2, 1 > > key3, 1 > > > > But, how I can calculate the percentage of each key over all keys, with > > the above input, I would expect to get the output as > > key1, 0.60 > > key2, 0.20 > > key3, 0.20 > > > > One naive method is to calculate the total count (5 with the above input) > > which is saved in a file. Then the file is read in before M/R starts. But > > it is obviously ugly and slow. > > > > I also tried to set a static enum Counters { INPUT_WORDS } > > In mapper I do context.getCounter(Counters.INPUT_WORDS).increment(1); > > In reducer I do context.getCounter(Counters.INPUT_WORDS).getValue(); > > But it does not work. it always returns 0. > > > > Is there more elegant way? > > > > -- > View this message in context: > http://www.nabble.com/Percentage-calculation--tp25008761p25013023.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > --0015174c3d84b543f104715d4014--