hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: Programming Question / Joining Dataset
Date Wed, 26 Sep 2012 13:42:56 GMT
1) One really easy but clumsy way is to just encode/decode the ADT/BDT as
MapWritable objects when you write them (in your maappers), and then read
them in the reducers, decoding them.....

2) The idiomatic way is to use a serialization framework like
avro/thrift/...  This will take more work to get working, but in the long
run, your code will read like standard java (i.e. you can use java pojos
that are read/written by your Serialization framework, rather than the
limited Writable framework, which doesnt really support ADTs).

So to exemplify 1:

Your Reducer signature might look like this:

Reducer<Text,MapWritable,Text,Integer>

and in your

reduce(Text key ,Iterator<MapWritable> values){

adtA = ADTA.readFromMap(values.next());
adtB = ADTB.readFromMap(values.next());
//lets say we're summing the age of A and B and emitting as the final
value.
context.emit(key, adtA.getAge()+adtB.Age());
}

public class ADTA {
Integer age;

static ADTA readFromMap(MapWritable m){
  age = Integer.parseInt(m.get("age"));
 }
}

Mime
View raw message