hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajiv Maheshwari <rajiv...@yahoo.com>
Subject Map-reduce sorting on multiple keys
Date Mon, 16 Nov 2009 17:13:06 GMT
Hi everyone,

I have a need to sort the output of map on 2 keys (key1, key2) - first on key1, then on key2.

Example:
key1  key2   values
-------------------------
0001  0001  values...
0001  0002  values...
0002  0001  values...
0002  0005  values...


I am thinking of the following solution approach:

Define KEY = key1, key2   /* concatenate keys */. Override default HashPartitioner and use
only key1 in hashCode computation.


public class HashPartitioner<K2, V2> implements Partitioner<K2, V2> {

public void configure(JobConf job) {}

public int getPartition(K2 key, V2 value, int numPartitions) {

    return (key.getKey1().hashCode() & Integer.MAX_VALUE) % numPartitions;
    }
}

Would this work?

Does anyone have any better ideas?

Thanks much,
Rajiv




      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message