avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Kleppmann <mkleppm...@linkedin.com>
Subject Re: Reading from disjoint schemas in map
Date Wed, 14 May 2014 08:48:26 GMT
Hi James,

If you're using code generation to create Java classes for the Avro schemas, you should be
able to just use Java's instanceof.

If you're using GenericRecord, you can use GenericRecord.getSchema() to determine the type
of a particular record.

Hope that helps,
Martin

On 13 May 2014, at 21:03, James Campbell <james@breachintelligence.com<mailto:james@breachintelligence.com>>
wrote:
I’m trying to read data into a mapreduce job, where the data may have been created by one
of a few different schemas, none of which are evolutions of one another (though they are related).

I have seen several people suggest using a union schema, such that during job setup, one would
set the input schema to be the union:
ArrayList<Schema> schemas = new ArrayList<Schema>();
schemas.add(schema1);
…
Schema unionSchema = Schema.createUnion(schemas);
AvroJob.setInputKeySchema(job, unionSchema);

However, I don’t know how to then extract the correct type inside my mapper (which was apparently
trivial (sorry—I’m new to avro)).

I’d guess that the map function profile becomes map(AvroKey<GenericRecord> key, NullWritable
value, …) but how can I then cause Avro to read the correctly-typed data from the GenericRecord?

Thanks!

James


Mime
View raw message