avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: Combining schemas
Date Tue, 09 Aug 2011 20:45:42 GMT
Thanks Scott and Doug, see follow up below.

On Tue, Aug 9, 2011 at 11:42 AM, Scott Carey <scottcarey@apache.org> wrote:

> On 8/9/11 11:15 AM, "Bill Graham" <billgraham@gmail.com> wrote:
> Hi,
> I'm trying to create a schema that references a type defined in another
> schema and I'm having some troubles. Is there an easy way to do this?
> My test schemas look like this:
> $ cat position.avsc
> {"type":"enum", "name": "Position", "namespace": "avro.examples.baseball",
>  "symbols": ["P", "C", "B1", "B2", "B3", "SS", "LF", "CF", "RF", "DH"]
> }
> $ cat player.avsc
> {"type":"record", "name":"Player", "namespace": "avro.examples.baseball",
>  "fields": [
>   {"name": "number", "type": "int"},
>   {"name": "first_name", "type": "string"},
>   {"name": "last_name", "type": "string"},
>   {"name": "position", "type": {"type": "array", "items":
> "avro.examples.baseball.Position"} }
>  ]
> }
> I've read this thread (
> http://apache-avro.679487.n3.nabble.com/How-to-reference-previously-defined-enum-in-avsc-file-td2663512.html)
> and tried using IDL like so with no luck:
> $ cat baseball.avdl
> @namespace("avro.examples.baseball")
> protocol Baseball {
>   import schema "position.avsc";
>   import schema "player.avsc";
> }
> $ java -jar avro-tools-1.5.1.jar idl  baseball.avdl baseball.avpr
> Exception in thread "main" org.apache.avro.SchemaParseException: Undefined
> name: "avro.examples.baseball.Position"
>         at org.apache.avro.Schema.parse(Schema.java:979)
>         at org.apache.avro.Schema.parse(Schema.java:1052)
>         at org.apache.avro.Schema.parse(Schema.java:1021)
>         at org.apache.avro.Schema.parse(Schema.java:884)
>         at org.apache.avro.compiler.idl.Idl.ImportSchema(Idl.java:388)
>         at org.apache.avro.compiler.idl.Idl.ProtocolBody(Idl.java:320)
>         at
> org.apache.avro.compiler.idl.Idl.ProtocolDeclaration(Idl.java:206)
>         at org.apache.avro.compiler.idl.Idl.CompilationUnit(Idl.java:84)
>         ...
> I agree that the documentation indicates that this should work.  I suspect
> that it may not be able to resolve dependencies among imports.  That is if
> Baseball depends on position, and on player, it works.  But since player
> depends on position, it does not.  The import statement pulls in each item
> individually for use in composite things in the AvroIDL, but does not allow
> for interdependencies in the imports.
> This seems worthy of a JIRA enhancement request.  I'm sure the project will
> accept a patch that adds this.
Done:  https://issues.apache.org/jira/browse/AVRO-872

> I also saw this blog post (
> http://www.infoq.com/articles/ApacheAvro#_ftnref6_7758) where the author
> had to write some nasty String.replace(..) code to combine schemas, but
> there's got to be a better way that this.
> We need to improve the ability to import multiple files when parsing.
>  Using the lower level Avro API you can parse the files yourself in an order
> that will work.
> I have simply put all my types in one file.  If you made one avsc file with
> both Position and Player in a JSON array it will complie.  It would look
> like:
> [
>   < position schema here>,
>   < player schema here>
> ]

Yes, I've used this approach in the past. Initially I was thinking that I
could write something to combine multiple files into a single InputStream
facade that generates a union like you describe, which could then be parsed.
I could then hold a handle to the union schema and provide a method to get a
given scheme type (i.e. the Player) by name. This is better than the String
replace(..) approach, but still a bit hacky.

 Using the lower level Avro API you can parse the files yourself in an order
> that will work.

How exactly would the approach work where you parse files in
reverse-dependency order work? This is something I'd like to explore and
maybe contribute a helper for. I've tried a few combinations of this
approach to no avail:

        Schema schema1 = Schema.parse(new
        Schema schema2 = schema1.parse(new

> Also FYI, it seems enum values can't start with numbers (i.e. '1B'). Is
> this a know issue or a feature? I haven't seen it documented anywhere. You
> get an error like this if the value starts with a number:
> org.apache.avro.SchemaParseException: Illegal initial character
> Enums are a named type.  The enum names must start with [A-Za-z_]  and
> subsequently contain only [A-Za-z0-9_].
> http://avro.apache.org/docs/1.5.1/spec.html#Names

I hadn't noticed that before, thanks.

> However, the spec does not say that the values must have such restrictions.
>  This may be a bug, can you file a JIRA ticket?

Done: https://issues.apache.org/jira/browse/AVRO-871

> Thanks!
> -Scott
> thanks,
> Bill

View raw message