hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: how to transfer the parameter from a reduce to another iteration of mapred
Date Sun, 11 Oct 2009 21:39:31 GMT
1.  Attachments don't work with this mailing list.

2.  Use a combiner to combine your sums and counts into bigger sums and

3.  The reducer should take a list of sum/count pairs, add them up, divide
and output the results.

Note that your combiner and reducer will not be identical.  This means that
you will need to call JobConf.setMapOutputValueClass to tell hadoop that you
are sending out pairs, but your will need to call
JobConf.setOutputValueClass to tell hadoop that your reducer will be
outputting averages.

If your key is class Key and your sum and count pair class is SumCountPair,
then the signature of your combiner would need to be roughly

      class AverageCombiner extends Reducer<Key, SumCountPair, Key,

and the signature of your reducer should be roughly

      class AverageReducer extends Reducer<Key, SumCountPair, Key,

Be warned that I didn't bother to get the details right here.   You will
probably have some kind of nasty surprise if you take what I say too
literally.  The basic ideas are correct, though.

On Fri, Oct 9, 2009 at 11:53 PM, congchong liu <pardtz@gmail.com> wrote:

> Hi all, I am fairly new to Hadoop, so please bear with me if you think my
> question is too straight forward. I am to computes the average of the values
> in a file F. F contains a large number of floating point numbers. F is
> splitted into different fragments. In mapper, for each fragment, I am using
> two keywords: SUM and COUNT, and then pass two (key, value) pairs to the
> framework: (SUM, sum of floating point numbers in the fragment) (COUNT,
> total number of floating point numbers in the fragment) In reduce, I can
> calculate the SUM and COUNT separately. But the problem is, I find no place
> I can put in the code to compute SUM/COUNT to get the average. I've attached
> my code here. Can someone shed some light on this? Thanks very much!
> Congcong

Ted Dunning, CTO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message