From Sachin Goyal <sgo...@walmartlabs.com>
Subject Re: How to deserialize avro file with union/many schemas?
Date Wed, 23 Jul 2014 22:42:42 GMT

To see a union schema, do the following:
System.out.println (ReflectData.AllowNull.get().getSchema(YourClass.class));

And then do the following:
System.out.println (ReflectData.get().getSchema(YourClass.class));

Diff the two outputs.
First one generates a UNION of each and every field with a null.

Hope that helps.

Date: Wednesday, July 23, 2014 at 3:09 PM
Hi Mike,

I read through most of the doc on avro site, don't see anything about the "union schema",
Mike, would you mind give me some example here how the union schma is defined? also what package/method
can retrieve the master schema from avro file? is that "getschema()" should work? and how
to read in each Avro datums whithout knowing their corresponding schema?....

very much appreciate your help!

It's just a regular Union :-) http://avro.apache.org/docs/1.7.6/spec.html#Unions


Thanks Mike, it sounds make sense, is there any doc I can read about union schema?

    Just to make sure I understand you correctly - do you have a file with multiple Avro datums
in it, each one following a separate schema?  And are all of these schemas unioned together
in a file-level "master schema?"  (As far as I know, Avro file readers and writers only support
one schema per file, so this is the only way your question makes sense to me.)
    If that's the case, then you can get the file's "master schema" and determine what all
of the different types are:

List<Schema> allTypes = masterSchema.getTypes(); // Assumes masterSchema is of Type.UNION

Then when you read each Avro datum in the file, you can check which of the schemas it conforms
to, and write a new file with just that sub-schema and the one datum in it.

Does that make sense?

For the purpose of others on this list, can ytou please provide an example of your schema?

I'm new here, hope I can get help from you guys. Basically I have an avro file with union/many
schemas and mixed records. I will need to split it to many avro file, one schema per file.
All the stuff I've been reading is about serializing and deserializing avro file with one
schema, which is pretty straightforward, but in my case I have no clue, any ideas?


