Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of jingkei.ly@gmail.com
 designates 74.125.78.25 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:from:date
         :x-google-sender-auth:message-id:subject:to:content-type;
        b=FwYs5GpMWK44ZwdUyiD2UCKyIHjAX2x9BgXtN1YqEawG1dXSlb+rDIgDvh2JdRsiAC
         UqFFqgcVuVwnPaIn2LJ0woI/J7diWyABWUlpoHI0K2ZtXZPIwgO4dyqiqCd6wZsaBmef
         yf2i2Q0MqMKoTwuOiIvPi5GvGpfNfM6uRDrxc=
MIME-Version: 1.0
Sender: jingkei.ly@gmail.com
In-Reply-To: <25013023.post@talk.nabble.com>
References: <25008761.post@talk.nabble.com> <25013023.post@talk.nabble.com>
From: Jingkei Ly <jly.list@googlemail.com>
Date: Mon, 17 Aug 2009 22:40:18 +0100
Message-ID: <c238cbb50908171440j61017f30sdda8dc960c1eed7@mail.gmail.com>
Subject: Re: Percentage calculation?
To: common-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0015174c3d84b543f104715d4014

--0015174c3d84b543f104715d4014
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

If the counter method doesn't work, I've used a slightly hacky way to do
something like this in the past with the 0.19 API.

In the Mapper I kept an instance variable keeping the count, and in the
close() method I wrote out a file unique to each mapper task containing the
final value of the instance variable.

Then in the Reducers it would read in all the values and aggregate them
together to give you the total count across all mappers. It relies on the
fact that the Reducers don't start before all the Mappers have finished.

i.e. in pseudo-code

class Mapper {
    int inputWords = 0;

    map(key, value){
         inputWords += value;
    }

    close() {
         // write out inputWords to a file unique to this mapper task
    }
}

class Reducer {
    int totalInputWords = 0;

    reduce() {
        if (firstTime)  {
            for all inputWordFiles, f {
                 int mapperInputWord = f.readInt();
                 totalInputWords += mapperInputWord;
            }
        }
        // use totalInputWords to calculate percentage
    }

}

Hope that makes sense.

2009/8/17 tigertail <tyczjs@yahoo.com>

>
> Can sb help please? I would expect there must be some easy way to do that.
>
> Some corrections,
> In reducer I do context.getCounter(Counters.INPUT_WORDS).getValue();
> But it does not work. it always returns 0.
>
>
> tigertail wrote:
> >
> > Hi Hadoop/MapReduce experts,
> >
> > My question might be naive, But I am really stuck here and I am looking
> > forward to get helps/advises from you.
> >
> > I have an input file like
> > key1, 2
> > key2, 1
> > key1, 1
> > key3, 1
> >
> > It is easy to write a M/R code to calculate the count for each key and
> > output sth like
> > key1, 3
> > key2, 1
> > key3, 1
> >
> > But, how I can calculate the percentage of each key over all keys, with
> > the above input, I would expect to get the output as
> > key1, 0.60
> > key2, 0.20
> > key3, 0.20
> >
> > One naive method is to calculate the total count (5 with the above input)
> > which is saved in a file. Then the file is read in before M/R starts. But
> > it is obviously ugly and slow.
> >
> > I also tried to set a static enum Counters { INPUT_WORDS }
> > In mapper I do context.getCounter(Counters.INPUT_WORDS).increment(1);
> > In reducer I do context.getCounter(Counters.INPUT_WORDS).getValue();
> > But it does not work. it always returns 0.
> >
> > Is there more elegant way?
> >
>
> --
> View this message in context:
> http://www.nabble.com/Percentage-calculation--tp25008761p25013023.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

--0015174c3d84b543f104715d4014--