avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Friso van Vollenhoven <fvanvollenho...@xebia.com>
Subject Re: Avro-mapred and new Java MapReduce API (org.apache.hadoop.mapreduce)
Date Sun, 13 Nov 2011 13:04:52 GMT

I use my own set of classes for this. I mostly copied from / modeled after the Avro mapred
support for the old API.

My approach is slightly different, though. The existing MR support fully abstracts / wraps
away the Hadoop MR API and only exposes the Avro one. The only Hadoop API that the Avro classes
see is the Configuration object. Problem is that in the new API, the Configuration object
is kept within a context instance and you'd need to wrap the whole context thing and give
the wrapper to the Avro mapper and reducer. This felt a bit overkill so I chose to just make
mapper and reducer subclasses that handle the Avro work and then call a protected method to
do the actual mapping or reducing. Problem is that you lose the property of a bare mapper
or reducer being the identity function, but you could reintroduce this in a generic way, I
think. I just don't use the identity functions a lot in practice, so I didn't bother.

I pushed the code here: https://github.com/friso/avro-mapreduce. There is a unit test with
some usage examples.


On 11 nov. 2011, at 20:43, Doug Cutting wrote:

On 11/10/2011 12:38 AM, Andrew Kenworthy wrote:
Are there plans to extend it to work with org.apache.hadoop.mapreduce as

There's an issue in Jira for this:


I don't know of anyone actively working on this at present.  It would be
a great addition to Avro and I am hopeful someone will resume work on it


View raw message