hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vyacheslav Zholudev <vyacheslav.zholu...@gmail.com>
Subject Re: Multiple avro outputs from a reducer
Date Sat, 30 Jul 2011 19:08:58 GMT
Thanks, Jason. I will try that

Vyacheslav

On 30 July 2011 19:26, Jason <urgisb@gmail.com> wrote:

> You can extend/customize MultipleOutputs and pass schema related settings
> via properties prefixed with MO name, just like it is done with format
> classes there.
>
> Also to send a dummy key or value why not just to use NullWritable? It's
> efficient as it does not consume any space.
>
> Sent from my iPhone
>
> On Jul 26, 2011, at 5:46 AM, Vyacheslav Zholudev <
> vyacheslav.zholudev@gmail.com> wrote:
>
> > Hi,
> >
> > I'm using the avro format both for input and output, for a mapper and a
> reducer. I would like to output multiple avro items with different schemata.
> For sequence files I would use the MultipleOutputs class from the mapreduce
> package.
> >
> > I looked into the same class but from the old package "mapred" and
> realized that I can pass an AvroOutputFormat.class parameter when adding
> another output. However, I didn't manage to figure out how to provide an
> avro schema for each output. Moreover, when writing to output , I need to
> provide a key and a value, but in case of avro we usually just pass a
> specific avro object. All above makes me think that the old MultipleOutputs
> API wouldn't work with avro files. Am I right?
> >
> > Any pointers of how to output multiple avro records in the same reducer
> are appreciated.
> >
> > P.S. Another thought was to create an avro schema of type union that will
> contain all possible output schemata, but I would like to avoid that.
> >
> > Thanks in advance!!!
> >
> > --
> > Best,
> > Vyacheslav
>



-- 
Best,
Vyacheslav Zholudev

Mime
View raw message