avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Echo <echo...@gmail.com>
Subject Re: How to deserialize avro file with union/many schemas?
Date Wed, 23 Jul 2014 05:24:13 GMT
Also, does the avro command line tool work with union schema?

> On Jul 22, 2014, at 2:32 PM, Michael Pigott <mpigott.subscriptions@gmail.com> wrote:
> 
> Echo,
>     Just to make sure I understand you correctly - do you have a file with multiple Avro
datums in it, each one following a separate schema?  And are all of these schemas unioned
together in a file-level "master schema?"  (As far as I know, Avro file readers and writers
only support one schema per file, so this is the only way your question makes sense to me.)
>     If that's the case, then you can get the file's "master schema" and determine what
all of the different types are:
> 
> List<Schema> allTypes = masterSchema.getTypes(); // Assumes masterSchema is of
Type.UNION
> 
> Then when you read each Avro datum in the file, you can check which of the schemas it
conforms to, and write a new file with just that sub-schema and the one datum in it.
> 
> Does that make sense?
> Mike
> 
> 
>> On Tue, Jul 22, 2014 at 3:22 PM, Lewis John Mcgibbney <lewis.mcgibbney@gmail.com>
wrote:
>> For the purpose of others on this list, can ytou please provide an example of your
schema?
>> Thanks
>> Lewis
>> 
>> 
>>> On Tue, Jul 22, 2014 at 12:06 PM, Echo Li <echolql@gmail.com> wrote:
>>> Hello,
>>> 
>>> I'm new here, hope I can get help from you guys. Basically I have an avro file
with union/many schemas and mixed records. I will need to split it to many avro file, one
schema per file. All the stuff I've been reading is about serializing and deserializing avro
file with one schema, which is pretty straightforward, but in my case I have no clue, any
ideas? 
>> 
>> 
>> 
>> -- 
>> Lewis 
> 

Mime
View raw message