hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teppo Kurki <...@iki.fi>
Subject Different Key/Value classes for Map and Reduce?
Date Wed, 29 Mar 2006 15:03:46 GMT
Trying Hadoop out with a proof of concept I came across the following 
problem.

My input looks conceptually like this:

textid1, number1, number2, number3
textid2, number2, number2, number3
textid1, number2, number5
...

I am interested in getting unique textid counts per number. Numbers are 
Longs.

My Mapper parses the values from input lines and emits <LongWritable, 
UTF8> pairs like this:
number1, textid1
number2, textid1
number3, textid1
number1, textid2
number2, textid2
number3, textid2
number2, textid1
number5, textid1
...

and my Reducer counts unique textids per number and emits <LongWritable, 
IntWritable> pairs.

Is there a way to define different Key and Value classes separately for 
the Map and Reduce phases? The easy workaround is to emit the counts as 
strings, but surely somebody has come across this kind of usage before. 
I have a little more complicated analyses in mind that will call for 
more complex data structures to be handled separately.



Mime
View raw message