avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <scottca...@apache.org>
Subject Re: Combining schemas
Date Tue, 09 Aug 2011 18:42:57 GMT
On 8/9/11 11:15 AM, "Bill Graham" <billgraham@gmail.com> wrote:

> Hi,
> 
> I'm trying to create a schema that references a type defined in another schema
> and I'm having some troubles. Is there an easy way to do this?
> 
> My test schemas look like this:
> 
> $ cat position.avsc
> {"type":"enum", "name": "Position", "namespace": "avro.examples.baseball",
>  "symbols": ["P", "C", "B1", "B2", "B3", "SS", "LF", "CF", "RF", "DH"]
> }
> 
> $ cat player.avsc
> {"type":"record", "name":"Player", "namespace": "avro.examples.baseball",
>  "fields": [
>   {"name": "number", "type": "int"},
>   {"name": "first_name", "type": "string"},
>   {"name": "last_name", "type": "string"},
>   {"name": "position", "type": {"type": "array", "items":
> "avro.examples.baseball.Position"} }
>  ]
> }
> 
> I've read this thread
> (http://apache-avro.679487.n3.nabble.com/How-to-reference-previously-defined-e
> num-in-avsc-file-td2663512.html) and tried using IDL like so with no luck:
> 
> $ cat baseball.avdl
> @namespace("avro.examples.baseball")
> protocol Baseball {
>   import schema "position.avsc";
>   import schema "player.avsc";
> }
> 
> $ java -jar avro-tools-1.5.1.jar idl  baseball.avdl baseball.avpr
> Exception in thread "main" org.apache.avro.SchemaParseException: Undefined
> name: "avro.examples.baseball.Position"
>         at org.apache.avro.Schema.parse(Schema.java:979)
>         at org.apache.avro.Schema.parse(Schema.java:1052)
>         at org.apache.avro.Schema.parse(Schema.java:1021)
>         at org.apache.avro.Schema.parse(Schema.java:884)
>         at org.apache.avro.compiler.idl.Idl.ImportSchema(Idl.java:388)
>         at org.apache.avro.compiler.idl.Idl.ProtocolBody(Idl.java:320)
>         at org.apache.avro.compiler.idl.Idl.ProtocolDeclaration(Idl.java:206)
>         at org.apache.avro.compiler.idl.Idl.CompilationUnit(Idl.java:84)
>         ...

I agree that the documentation indicates that this should work.  I suspect
that it may not be able to resolve dependencies among imports.  That is if
Baseball depends on position, and on player, it works.  But since player
depends on position, it does not.  The import statement pulls in each item
individually for use in composite things in the AvroIDL, but does not allow
for interdependencies in the imports.
This seems worthy of a JIRA enhancement request.  I'm sure the project will
accept a patch that adds this.

> 
> 
> I also saw this blog post
> (http://www.infoq.com/articles/ApacheAvro#_ftnref6_7758) where the author had
> to write some nasty String.replace(..) code to combine schemas, but there's
> got to be a better way that this.

We need to improve the ability to import multiple files when parsing.  Using
the lower level Avro API you can parse the files yourself in an order that
will work.  
I have simply put all my types in one file.  If you made one avsc file with
both Position and Player in a JSON array it will complie.  It would look
like:
[
  < position schema here>,
  < player schema here>
]

> 
> Also FYI, it seems enum values can't start with numbers (i.e. '1B'). Is this a
> know issue or a feature? I haven't seen it documented anywhere. You get an
> error like this if the value starts with a number:
> 
> org.apache.avro.SchemaParseException: Illegal initial character


Enums are a named type.  The enum names must start with [A-Za-z_]  and
subsequently contain only [A-Za-z0-9_].
http://avro.apache.org/docs/1.5.1/spec.html#Names

However, the spec does not say that the values must have such restrictions.
This may be a bug, can you file a JIRA ticket?

Thanks!

-Scott

> 
> thanks,
> Bill
> 



Mime
View raw message