hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From goutham patnaik <goutham.patn...@gmail.com>
Subject Re: Map-reduce sorting on multiple keys
Date Tue, 17 Nov 2009 11:28:52 GMT
you want tuple keys (key1, key2) that have the same values for key1 to go to
the same reducer ? You can do this by writing your own partition function as
you mentioned in your previous email - the partition function can simply use
key1 to determine how to partition the data

to make sure that the data that reaches a reducer is sorted the way you want
it (first by key1 and then by key2), you can write your own key class that
implements WritableComparable

does that answer your question ?

Goutham

On Mon, Nov 16, 2009 at 12:17 PM, Rajiv Maheshwari <rajivm01@yahoo.com>wrote:

> On a second thought implementing WritableComparable will not help in my
> case. I forgot to mention that I want to use multiple reducers. Records with
> same key1 splitting across multiple reducers will break the logic.
>
> If only 1 reducer is being used, I guess sorting on multiple keys can be
> accomplished just by concatenating the keys, unless one has the need to
> change the compare method.
>
> Thanks,
> Rajiv
>
> --- On Mon, 11/16/09, goutham patnaik <goutham.patnaik@gmail.com> wrote:
>
> From: goutham patnaik <goutham.patnaik@gmail.com>
> Subject: Re: Map-reduce sorting on multiple keys
> To: general@hadoop.apache.org
> Date: Monday, November 16, 2009, 11:08 AM
>
> Rajiv,
>
> You could write your own class which implements the WritableComparable
> interface and use this as your key class -  all u need to do is implement
> the write, readFields and compareTo methods - the map will then sort your
> keys using this method :
>
> public class TupleKey implements WritableComparable {
>  IntWritable k1;
>  IntWritable k2;
> .......
> }
>
> On Mon, Nov 16, 2009 at 9:13 AM, Rajiv Maheshwari <rajivm01@yahoo.com
> >wrote:
>
> > Hi everyone,
> >
> > I have a need to sort the output of map on 2 keys (key1, key2) - first on
> > key1, then on key2.
> >
> > Example:
> > key1  key2   values
> > -------------------------
> > 0001  0001  values...
> > 0001  0002  values...
> > 0002  0001  values...
> > 0002  0005  values...
> >
> >
> > I am thinking of the following solution approach:
> >
> > Define KEY = key1, key2   /* concatenate keys */. Override default
> > HashPartitioner and use only key1 in hashCode computation.
> >
> >
> > public class HashPartitioner<K2, V2> implements Partitioner<K2, V2>
{
> >
> > public void configure(JobConf job) {}
> >
> > public int getPartition(K2 key, V2 value, int numPartitions) {
> >
> >     return (key.getKey1().hashCode() & Integer.MAX_VALUE) %
> numPartitions;
> >     }
> > }
> >
> > Would this work?
> >
> > Does anyone have any better ideas?
> >
> > Thanks much,
> > Rajiv
> >
> >
> >
> >
> >
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message