hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hien Luu <h...@yahoo.com>
Subject Re: I am attempting to use setOutputValueGroupingComparator as a secondary sort on the values
Date Tue, 28 Oct 2008 18:16:11 GMT
This is nice feature for sorting keys and values.  Is there more documentation somewhere that
I can find?  or is there a MapReduce example that uses this feature?



From: Owen O'Malley <omalley@apache.org>
To: core-user@hadoop.apache.org
Sent: Tuesday, October 28, 2008 8:38:36 AM
Subject: Re: I am attempting to use setOutputValueGroupingComparator as a secondary sort on
the values

On Oct 28, 2008, at 7:53 AM, David M. Coe wrote:

> My mapper is Mapper<LongWritable, Text, IntWritable, IntWritable> and my
> reducer is the identity.  I configure the program using:
> conf.setOutputKeyClass(IntWritable.class);
> conf.setOutputValueClass(IntWritable.class);
> conf.setMapperClass(MapClass.class);
> conf.setReducerClass(IdentityReducer.class);
> conf.setOutputKeyComparatorClass(IntWritable.Comparator.class);
> conf.setOutputValueGroupingComparator(IntWritable.Comparator.class);

The problem is that your map needs to look like:

class IntPair implements Writable {
  private int left;
  private int right;
  public void set(int left, int right) { ... }
  public int getLeft() {...}
  public int getRight() {...}

your Mapper should be Mapper<LongWritable, Text, IntPair, IntWritable> and should emit

IntPair key = new IntPair();
IntegerWritable value = new IntegerWritale();
key.set(keyValue, valueValue);
output.collect(key, value);

Your sort comparator should take compare both left and right in the pair.
The grouping comparator should only look at left in the pair.

Your Reducer should be Reducer<IntPair, IntWritable, IntWritable, IntWritable>

output.collect(key.getLeft(), value);

Is that clearer?

-- Owen

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message