avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Weimer <wei...@yahoo-inc.com>
Subject Keys between Mapper and Reducer in AvroJobs
Date Tue, 19 Apr 2011 00:16:40 GMT

another question about writing hadoop  jobs using avro. I want to implement a basic shuffle
and file aggregation: Mappers emit their input with random keys, reducers just write to disk.
The number of reducers determines how many files I get in the result. The mapred documentation
on Jobs where both input and putput are avro says:

> Subclass AvroMapper and specify this as your job's mapper with [...]

However, AvroMapper only seems to support input and output values, not keys. Did I miss the
obvious here?



PS: Ideally, I'd implement the shuffle without ever deserializing the data, which should be
possible. But that is the next step.
View raw message