avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Weimer <wei...@yahoo-inc.com>
Subject Re: Multiple input schemas in MapReduce?
Date Fri, 20 May 2011 17:43:40 GMT
Hi,

just an update: The solution below does, indeed, work as expected. Thanks!

Markus

On May 11, 2011, at 3:00 PM, Jacob R Rideout wrote:

> We do take the union schema approach, but create the unions
> programmaticly in java:
> 
> Something like:
> 
> ArrayList<Schema> schemas = new ArrayList<Schema>();
> schemas.add(schema1);
> schemas.add(schema2);
> Schema unionSchema = Schema.createUnion(schemas);
> AvroJob.setInputSchema(job, unionSchema);
> 
> 
> On Wed, May 11, 2011 at 12:44 PM, Markus Weimer <weimer@yahoo-inc.com> wrote:
>> Hi,
>> 
>> I'd like to write a mapreduce job that uses avro throughout, but the map phase would
need to read files with two different schemas, similar to what the MultipleInputFormat does
in stock hadoop. Is this a supported use case?
>> 
>> A work-around would be to create a union schema that has both fields as optional
and to convert all data into it, but that seems clumsy.
>> 
>> Has anyone done this before?
>> 
>> Thanks for any suggestion you can give,
>> 
>> Markus
>> 
>> 


Mime
View raw message