hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sofia Georgiakaki <geosofie_...@yahoo.com>
Subject Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)
Date Tue, 23 Apr 2013 07:45:19 GMT

Sorting is done by the SortingComparator which performs sorting based on the value of key.
A possible solution would be the following:
You could write a custom Writable comparable class which extends WritableComparable (lets
call it MyCompositeFieldWritableComparable), that will store your current key and the part
of the value that you want your sorting to be based on. As I understand from your description,
this writable class will have 2 IntWritable fields, e.g
(FieldA, fieldB)

Implement the methods equals, sort, hashCode, etc in your custom writable to override the
defaults. Sorting before the reduce phase will be performed based on the compareTo() implementation
of your custom writable, so you can write it in a way that will compare only fieldB. 

Be careful in the way you will implement methods MyCompositeFieldWritableComparable.equals()
-it will be used to group <key, list(values)> in the reducer-, MyCompositeFieldWritableComparable.compareTo()
and MyCompositeFieldWritableComparable.hashCode().
So your new KEY class will be MyCompositeFieldWritableComparable.
As an alternative and cleaner implementation, write the MyCompositeFieldWritableComparable
class and also a HashOnOneFieldPartitioner class (which extends Partitioner) that will do
something like this:


public int getPartition(K key, V value,
                          int numReduceTasks) {
    if (key instanceof MyCompositeFieldWritableComparable)
         return ( ((MyCompositeFieldWritableComparable) key).hashCodeBasedOnFieldB()
& Integer.MAX_VALUE) % numReduceTasks;
        return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;

You can also find related articles in the web, eg http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/.

Have a nice day,

> From: Vikas Jadhav <vikascjadhav87@gmail.com>
>To: user@hadoop.apache.org 
>Sent: Tuesday, April 23, 2013 8:44 AM
>Subject: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)
>how to sort value in hadoop using standard sorting algorithm of hadoop ( i.e sorting facility
provided by hadoop)
>1) Values shoulde be sorted depending on some part of value 
>For Exam     (KEY,VALUE)
> (0,"BC,4,XY')
> (1,"DC,1,PQ")
> (2,"EF,0,MN")
>Sorted sequence @ reduce reached should be 
>Here sorted depending on second attribute postion in value.
>  Regards,
>   Vikas 
View raw message