hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang" <hair...@yahoo-inc.com>
Subject RE: Different Key/Value classes for Map and Reduce?
Date Wed, 29 Mar 2006 18:17:40 GMT
If you have a Jobconf called xxJob, you can set the key/value types for the
map phase as follows:
xxJob.setInputFormat(SequenceFileInputFormat.class);
xxJob.setInputKeyClass(UTF8.class);
xxJob.setInputValueClass(ArrayWritable.class);

Then set the key/value types for the reduce phase as follows:
xxJob.setOutputFormat();
xxJob.setOutputKeyClass(LongWritable.class);
xxJob.setoutputValueClass(UT8.class);

Hairong

-----Original Message-----
From: Teppo Kurki [mailto:tjk@iki.fi] 
Sent: Wednesday, March 29, 2006 7:04 AM
To: hadoop-user@lucene.apache.org
Subject: Different Key/Value classes for Map and Reduce?

Trying Hadoop out with a proof of concept I came across the following
problem.

My input looks conceptually like this:

textid1, number1, number2, number3
textid2, number2, number2, number3
textid1, number2, number5
...

I am interested in getting unique textid counts per number. Numbers are
Longs.

My Mapper parses the values from input lines and emits <LongWritable, 
UTF8> pairs like this:
number1, textid1
number2, textid1
number3, textid1
number1, textid2
number2, textid2
number3, textid2
number2, textid1
number5, textid1
...

and my Reducer counts unique textids per number and emits <LongWritable, 
IntWritable> pairs.

Is there a way to define different Key and Value classes separately for the
Map and Reduce phases? The easy workaround is to emit the counts as strings,
but surely somebody has come across this kind of usage before. 
I have a little more complicated analyses in mind that will call for more
complex data structures to be handled separately.




Mime
View raw message