avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacob R Rideout <apa...@jacobrideout.net>
Subject Re: Multiple input schemas in MapReduce?
Date Wed, 11 May 2011 22:00:34 GMT
We do take the union schema approach, but create the unions
programmaticly in java:

Something like:

ArrayList<Schema> schemas = new ArrayList<Schema>();
schemas.add(schema1);
schemas.add(schema2);
Schema unionSchema = Schema.createUnion(schemas);
AvroJob.setInputSchema(job, unionSchema);


On Wed, May 11, 2011 at 12:44 PM, Markus Weimer <weimer@yahoo-inc.com> wrote:
> Hi,
>
> I'd like to write a mapreduce job that uses avro throughout, but the map phase would
need to read files with two different schemas, similar to what the MultipleInputFormat does
in stock hadoop. Is this a supported use case?
>
> A work-around would be to create a union schema that has both fields as optional and
to convert all data into it, but that seems clumsy.
>
> Has anyone done this before?
>
> Thanks for any suggestion you can give,
>
> Markus
>
>

Mime
View raw message