hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vyacheslav Zholudev <vyacheslav.zholu...@gmail.com>
Subject Multiple avro outputs from a reducer
Date Wed, 27 Jul 2011 07:36:38 GMT

I'm using the avro format both for input and output, for a mapper and a
reducer. I would like to output multiple avro items with different schemata.
For sequence files I would use the MultipleOutputs class from the mapreduce

I looked into the same class but from the old package "mapred" and realized
that I can pass an AvroOutputFormat.class parameter when adding another
output. However, I didn't manage to figure out how to provide an avro schema
for each output. Moreover, when writing to output , I need to provide a key
and a value, but in case of avro we usually just pass a specific avro
object. All above makes me think that the old MultipleOutputs API wouldn't
work with avro files. Am I right?

Any pointers of how to output multiple avro records in the same reducer are

P.S. Another thought was to create an avro schema of type union that will
contain all possible output schemata, but I would like to avoid that.

Thanks in advance!!!


View raw message