hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Re: Map-reduce sorting on multiple keys
Date Mon, 16 Nov 2009 19:25:48 GMT
Goutham,

Can you please take a look at my email titled.. "Custom Writable not
working"?  (I just sent it a few minutes ago.)  It's similar to this one.
The only difference is, instead of IntWritable I am using Text.  But the
sort in Map is not working as expected.  Can you tell me why?  Thanks.


On Mon, Nov 16, 2009 at 11:16 AM, Rajiv Maheshwari <rajivm01@yahoo.com>wrote:

> Thanks, appreciate it.
>
> Rajiv
>
> --- On Mon, 11/16/09, goutham patnaik <goutham.patnaik@gmail.com> wrote:
>
> From: goutham patnaik <goutham.patnaik@gmail.com>
> Subject: Re: Map-reduce sorting on multiple keys
> To: general@hadoop.apache.org
> Date: Monday, November 16, 2009, 11:08 AM
>
> Rajiv,
>
> You could write your own class which implements the WritableComparable
> interface and use this as your key class -  all u need to do is implement
> the write, readFields and compareTo methods - the map will then sort your
> keys using this method :
>
> public class TupleKey implements WritableComparable {
>  IntWritable k1;
>  IntWritable k2;
> .......
> }
>
> On Mon, Nov 16, 2009 at 9:13 AM, Rajiv Maheshwari <rajivm01@yahoo.com
> >wrote:
>
> > Hi everyone,
> >
> > I have a need to sort the output of map on 2 keys (key1, key2) - first on
> > key1, then on key2.
> >
> > Example:
> > key1  key2   values
> > -------------------------
> > 0001  0001  values...
> > 0001  0002  values...
> > 0002  0001  values...
> > 0002  0005  values...
> >
> >
> > I am thinking of the following solution approach:
> >
> > Define KEY = key1, key2   /* concatenate keys */. Override default
> > HashPartitioner and use only key1 in hashCode computation.
> >
> >
> > public class HashPartitioner<K2, V2> implements Partitioner<K2, V2>
{
> >
> > public void configure(JobConf job) {}
> >
> > public int getPartition(K2 key, V2 value, int numPartitions) {
> >
> >     return (key.getKey1().hashCode() & Integer.MAX_VALUE) %
> numPartitions;
> >     }
> > }
> >
> > Would this work?
> >
> > Does anyone have any better ideas?
> >
> > Thanks much,
> > Rajiv
> >
> >
> >
> >
> >
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message