hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason <urg...@gmail.com>
Subject Re: Multiple avro outputs from a reducer
Date Sat, 30 Jul 2011 17:26:07 GMT
You can extend/customize MultipleOutputs and pass schema related settings via properties prefixed
with MO name, just like it is done with format classes there.

Also to send a dummy key or value why not just to use NullWritable? It's efficient as it does
not consume any space.

Sent from my iPhone

On Jul 26, 2011, at 5:46 AM, Vyacheslav Zholudev <vyacheslav.zholudev@gmail.com> wrote:

> Hi,
> 
> I'm using the avro format both for input and output, for a mapper and a reducer. I would
like to output multiple avro items with different schemata. For sequence files I would use
the MultipleOutputs class from the mapreduce package.
> 
> I looked into the same class but from the old package "mapred" and realized that I can
pass an AvroOutputFormat.class parameter when adding another output. However, I didn't manage
to figure out how to provide an avro schema for each output. Moreover, when writing to output
, I need to provide a key and a value, but in case of avro we usually just pass a specific
avro object. All above makes me think that the old MultipleOutputs API wouldn't work with
avro files. Am I right?
> 
> Any pointers of how to output multiple avro records in the same reducer are appreciated.

> 
> P.S. Another thought was to create an avro schema of type union that will contain all
possible output schemata, but I would like to avoid that.
> 
> Thanks in advance!!!
> 
> -- 
> Best,
> Vyacheslav

Mime
View raw message