hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jingkei Ly <jly.l...@googlemail.com>
Subject Re: Percentage calculation?
Date Mon, 17 Aug 2009 21:40:18 GMT
If the counter method doesn't work, I've used a slightly hacky way to do
something like this in the past with the 0.19 API.

In the Mapper I kept an instance variable keeping the count, and in the
close() method I wrote out a file unique to each mapper task containing the
final value of the instance variable.

Then in the Reducers it would read in all the values and aggregate them
together to give you the total count across all mappers. It relies on the
fact that the Reducers don't start before all the Mappers have finished.

i.e. in pseudo-code

class Mapper {
    int inputWords = 0;

    map(key, value){
         inputWords += value;
    }

    close() {
         // write out inputWords to a file unique to this mapper task
    }
}

class Reducer {
    int totalInputWords = 0;

    reduce() {
        if (firstTime)  {
            for all inputWordFiles, f {
                 int mapperInputWord = f.readInt();
                 totalInputWords += mapperInputWord;
            }
        }
        // use totalInputWords to calculate percentage
    }

}

Hope that makes sense.

2009/8/17 tigertail <tyczjs@yahoo.com>

>
> Can sb help please? I would expect there must be some easy way to do that.
>
> Some corrections,
> In reducer I do context.getCounter(Counters.INPUT_WORDS).getValue();
> But it does not work. it always returns 0.
>
>
> tigertail wrote:
> >
> > Hi Hadoop/MapReduce experts,
> >
> > My question might be naive, But I am really stuck here and I am looking
> > forward to get helps/advises from you.
> >
> > I have an input file like
> > key1, 2
> > key2, 1
> > key1, 1
> > key3, 1
> >
> > It is easy to write a M/R code to calculate the count for each key and
> > output sth like
> > key1, 3
> > key2, 1
> > key3, 1
> >
> > But, how I can calculate the percentage of each key over all keys, with
> > the above input, I would expect to get the output as
> > key1, 0.60
> > key2, 0.20
> > key3, 0.20
> >
> > One naive method is to calculate the total count (5 with the above input)
> > which is saved in a file. Then the file is read in before M/R starts. But
> > it is obviously ugly and slow.
> >
> > I also tried to set a static enum Counters { INPUT_WORDS }
> > In mapper I do context.getCounter(Counters.INPUT_WORDS).increment(1);
> > In reducer I do context.getCounter(Counters.INPUT_WORDS).getValue();
> > But it does not work. it always returns 0.
> >
> > Is there more elegant way?
> >
>
> --
> View this message in context:
> http://www.nabble.com/Percentage-calculation--tp25008761p25013023.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message