hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From goutham patnaik <goutham.patn...@gmail.com>
Subject Re: Map-reduce sorting on multiple keys
Date Mon, 16 Nov 2009 19:08:09 GMT
Rajiv,

You could write your own class which implements the WritableComparable
interface and use this as your key class -  all u need to do is implement
the write, readFields and compareTo methods - the map will then sort your
keys using this method :

public class TupleKey implements WritableComparable {
 IntWritable k1;
 IntWritable k2;
.......
}

On Mon, Nov 16, 2009 at 9:13 AM, Rajiv Maheshwari <rajivm01@yahoo.com>wrote:

> Hi everyone,
>
> I have a need to sort the output of map on 2 keys (key1, key2) - first on
> key1, then on key2.
>
> Example:
> key1  key2   values
> -------------------------
> 0001  0001  values...
> 0001  0002  values...
> 0002  0001  values...
> 0002  0005  values...
>
>
> I am thinking of the following solution approach:
>
> Define KEY = key1, key2   /* concatenate keys */. Override default
> HashPartitioner and use only key1 in hashCode computation.
>
>
> public class HashPartitioner<K2, V2> implements Partitioner<K2, V2> {
>
> public void configure(JobConf job) {}
>
> public int getPartition(K2 key, V2 value, int numPartitions) {
>
>     return (key.getKey1().hashCode() & Integer.MAX_VALUE) % numPartitions;
>     }
> }
>
> Would this work?
>
> Does anyone have any better ideas?
>
> Thanks much,
> Rajiv
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message