hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David M. Coe" <david....@chalklabs.net>
Subject Re: I am attempting to use setOutputValueGroupingComparator as a secondary sort on the values
Date Wed, 29 Oct 2008 14:59:53 GMT
Would the input using this method be sorted before the reducer?  I have
implemented this and only the keycomparatorclass is called.  This gives
the effect that if I output the data here it is sorted.  However; it
sorts comparing both the right and the left as you suggest so the
reducer is given unique right-left instead of being given right that
happen to be sorted using the left.

What I get:

text file ->
map: -> 0 0 -> reducer
        0 1 -> reducer
        8 0 -> reducer
        8 1 -> reducer

What I'd like:

text file ->
map: *******
     -> 0 0  \
     -> 0 1  | -> reducer
     -> 0 8  /
     *******
     -> 8 0  \ -> reducer
     -> 8 1  /
     *******
     -> 123 3  -> reducer

What is the best way to do this?  The keys must be secondary sorted
before the reduce, but I cannot think of a way to do this.

Thank you.



Owen O'Malley wrote:
> 
> On Oct 28, 2008, at 7:53 AM, David M. Coe wrote:
> 
>> My mapper is Mapper<LongWritable, Text, IntWritable, IntWritable> and my
>> reducer is the identity.  I configure the program using:
>>
>> conf.setOutputKeyClass(IntWritable.class);
>> conf.setOutputValueClass(IntWritable.class);
>>
>> conf.setMapperClass(MapClass.class);
>> conf.setReducerClass(IdentityReducer.class);
>>
>> conf.setOutputKeyComparatorClass(IntWritable.Comparator.class);
>> conf.setOutputValueGroupingComparator(IntWritable.Comparator.class);
> 
> The problem is that your map needs to look like:
> 
> class IntPair implements Writable {
>   private int left;
>   private int right;
>   public void set(int left, int right) { ... }
>   public int getLeft() {...}
>   public int getRight() {...}
> }
> 
> your Mapper should be Mapper<LongWritable, Text, IntPair, IntWritable>
> and should emit
> 
> IntPair key = new IntPair();
> IntegerWritable value = new IntegerWritale();
> ...
> key.set(keyValue, valueValue);
> value.set(keyValue,);
> output.collect(key, value);
> 
> Your sort comparator should take compare both left and right in the pair.
> The grouping comparator should only look at left in the pair.
> 
> Your Reducer should be Reducer<IntPair, IntWritable, IntWritable,
> IntWritable>
> 
> output.collect(key.getLeft(), value);
> 
> Is that clearer?
> 
> -- Owen


Mime
View raw message