Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9FA77794D for ; Tue, 9 Aug 2011 20:46:32 +0000 (UTC) Received: (qmail 43029 invoked by uid 500); 9 Aug 2011 20:46:32 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 42809 invoked by uid 500); 9 Aug 2011 20:46:31 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 42798 invoked by uid 99); 9 Aug 2011 20:46:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Aug 2011 20:46:31 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of billgraham@gmail.com designates 209.85.218.43 as permitted sender) Received: from [209.85.218.43] (HELO mail-yi0-f43.google.com) (209.85.218.43) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Aug 2011 20:46:24 +0000 Received: by yia27 with SMTP id 27so295013yia.30 for ; Tue, 09 Aug 2011 13:46:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=QkAmpS5eP9poSRQRP2OYvT0DPW91ekXN/O4RWa479pE=; b=ivH/xS1gErD4H7/DDuf7NIs/Op/+SJg6YjjoesBtxEoqjwLraR+67C2Z5V+mM8VrZJ RBDckZzZOe/2/KSjpkZOsZmVavO20X3W2DxM0Oy36hDee0MY4826Dg/d1h1FNO9aKmmM kHZkxZ2htVfjID4lv1mXszANRJM9WnvdOCjJQ= Received: by 10.236.77.71 with SMTP id c47mr4192359yhe.77.1312922763074; Tue, 09 Aug 2011 13:46:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.170.137 with HTTP; Tue, 9 Aug 2011 13:45:42 -0700 (PDT) Reply-To: billgraham@gmail.com In-Reply-To: References: From: Bill Graham Date: Tue, 9 Aug 2011 13:45:42 -0700 Message-ID: Subject: Re: Combining schemas To: Scott Carey Cc: "user@avro.apache.org" Content-Type: multipart/alternative; boundary=20cf300513a6e91cd704aa18a6a3 X-Virus-Checked: Checked by ClamAV on apache.org --20cf300513a6e91cd704aa18a6a3 Content-Type: text/plain; charset=ISO-8859-1 Thanks Scott and Doug, see follow up below. On Tue, Aug 9, 2011 at 11:42 AM, Scott Carey wrote: > On 8/9/11 11:15 AM, "Bill Graham" wrote: > > Hi, > > I'm trying to create a schema that references a type defined in another > schema and I'm having some troubles. Is there an easy way to do this? > > My test schemas look like this: > > $ cat position.avsc > {"type":"enum", "name": "Position", "namespace": "avro.examples.baseball", > "symbols": ["P", "C", "B1", "B2", "B3", "SS", "LF", "CF", "RF", "DH"] > } > > $ cat player.avsc > {"type":"record", "name":"Player", "namespace": "avro.examples.baseball", > "fields": [ > {"name": "number", "type": "int"}, > {"name": "first_name", "type": "string"}, > {"name": "last_name", "type": "string"}, > {"name": "position", "type": {"type": "array", "items": > "avro.examples.baseball.Position"} } > ] > } > > I've read this thread ( > http://apache-avro.679487.n3.nabble.com/How-to-reference-previously-defined-enum-in-avsc-file-td2663512.html) > and tried using IDL like so with no luck: > > $ cat baseball.avdl > @namespace("avro.examples.baseball") > protocol Baseball { > import schema "position.avsc"; > import schema "player.avsc"; > } > > $ java -jar avro-tools-1.5.1.jar idl baseball.avdl baseball.avpr > Exception in thread "main" org.apache.avro.SchemaParseException: Undefined > name: "avro.examples.baseball.Position" > at org.apache.avro.Schema.parse(Schema.java:979) > at org.apache.avro.Schema.parse(Schema.java:1052) > at org.apache.avro.Schema.parse(Schema.java:1021) > at org.apache.avro.Schema.parse(Schema.java:884) > at org.apache.avro.compiler.idl.Idl.ImportSchema(Idl.java:388) > at org.apache.avro.compiler.idl.Idl.ProtocolBody(Idl.java:320) > at > org.apache.avro.compiler.idl.Idl.ProtocolDeclaration(Idl.java:206) > at org.apache.avro.compiler.idl.Idl.CompilationUnit(Idl.java:84) > ... > > > I agree that the documentation indicates that this should work. I suspect > that it may not be able to resolve dependencies among imports. That is if > Baseball depends on position, and on player, it works. But since player > depends on position, it does not. The import statement pulls in each item > individually for use in composite things in the AvroIDL, but does not allow > for interdependencies in the imports. > This seems worthy of a JIRA enhancement request. I'm sure the project will > accept a patch that adds this. > > Done: https://issues.apache.org/jira/browse/AVRO-872 > > I also saw this blog post ( > http://www.infoq.com/articles/ApacheAvro#_ftnref6_7758) where the author > had to write some nasty String.replace(..) code to combine schemas, but > there's got to be a better way that this. > > > We need to improve the ability to import multiple files when parsing. > Using the lower level Avro API you can parse the files yourself in an order > that will work. > I have simply put all my types in one file. If you made one avsc file with > both Position and Player in a JSON array it will complie. It would look > like: > [ > < position schema here>, > < player schema here> > ] > Yes, I've used this approach in the past. Initially I was thinking that I could write something to combine multiple files into a single InputStream facade that generates a union like you describe, which could then be parsed. I could then hold a handle to the union schema and provide a method to get a given scheme type (i.e. the Player) by name. This is better than the String replace(..) approach, but still a bit hacky. Using the lower level Avro API you can parse the files yourself in an order > that will work. How exactly would the approach work where you parse files in reverse-dependency order work? This is something I'd like to explore and maybe contribute a helper for. I've tried a few combinations of this approach to no avail: Schema schema1 = Schema.parse(new File("examples/java/avro/position.avsc")); Schema schema2 = schema1.parse(new File("examples/java/avro/player.avsc")); > > > Also FYI, it seems enum values can't start with numbers (i.e. '1B'). Is > this a know issue or a feature? I haven't seen it documented anywhere. You > get an error like this if the value starts with a number: > > org.apache.avro.SchemaParseException: Illegal initial character > > > > Enums are a named type. The enum names must start with [A-Za-z_] and > subsequently contain only [A-Za-z0-9_]. > http://avro.apache.org/docs/1.5.1/spec.html#Names > I hadn't noticed that before, thanks. > > However, the spec does not say that the values must have such restrictions. > This may be a bug, can you file a JIRA ticket? > Done: https://issues.apache.org/jira/browse/AVRO-871 > > Thanks! > > -Scott > > > thanks, > Bill > > --20cf300513a6e91cd704aa18a6a3 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Scott and Doug, see follow up below.

On Tue, Aug 9, 2011 at 11:42 AM, Scott Carey <scottcarey@apache.org> wr= ote:
On 8/9/11 11:15 AM, "Bill Graham" <billgraham@gmail.com> wrote:

Hi,

I'm trying to create a schema that references a = type defined in another schema and I'm having some troubles. Is there a= n easy way to do this?

My test schemas look like t= his:

$ cat position.avsc=A0
{"type"= :"enum", "name": "Position", "namespace&= quot;: "avro.examples.baseball",
=A0"symbols"= : ["P", "C", "B1", "B2", "B3&q= uot;, "SS", "LF", "CF", "RF", "= ;DH"]
}

$ cat player.avsc=A0
<= div>{"type":"record", "name":"Player&quo= t;, "namespace": "avro.examples.baseball",
=A0"fields": [
=A0 {"name": "number"= ;, "type": "int"},
=A0 {"name": &qu= ot;first_name", "type": "string"},
=A0 {= "name": "last_name", "type": "string&quo= t;},
=A0 {"name": "position", "type": {"= type": "array", "items": "avro.examples.baseb= all.Position"} }
=A0]
}


$ cat baseball.avdl=A0
@namespace("= avro.examples.baseball")
protocol Baseball {
=A0 i= mport schema "position.avsc";
=A0 import schema "p= layer.avsc";
}

$ java -jar avro-tools-1.5.1.jar= idl =A0baseball.avdl baseball.avpr
Exception in thread "mai= n" org.apache.avro.SchemaParseException: Undefined name: "avro.ex= amples.baseball.Position"
=A0 =A0 =A0 =A0 at org.apache.avro.Schema.parse(Schema.java:979)
=
=A0 =A0 =A0 =A0 at org.apache.avro.Schema.parse(Schema.java:1052)
=A0 =A0 =A0 =A0 at org.apache.avro.Schema.parse(Schema.java:1021)
=A0 =A0 =A0 =A0 at org.apache.avro.Schema.parse(Schema.java:884)
=A0 =A0 =A0 =A0 at org.apache.avro.compiler.idl.Idl.ImportSchema(Idl.j= ava:388)
=A0 =A0 =A0 =A0 at org.apache.avro.compiler.idl.Idl.Prot= ocolBody(Idl.java:320)
=A0 =A0 =A0 =A0 at org.apache.avro.compile= r.idl.Idl.ProtocolDeclaration(Idl.java:206)
=A0 =A0 =A0 =A0 at org.apache.avro.compiler.idl.Idl.CompilationUnit(Id= l.java:84)
=A0 =A0 =A0 =A0 ...

I agree that the docume= ntation indicates that this should work. =A0I suspect that it may not be ab= le to resolve dependencies among imports. =A0That is if Baseball depends on= position, and on player, it works. =A0But since player depends on position= , it does not. =A0The import statement pulls in each item individually for = use in composite things in the AvroIDL, but does not allow for interdepende= ncies in the imports.
This seems worthy of a JIRA enhancement request. =A0I'm sure the p= roject will accept a patch that adds this.

=


I also saw this blog post (http://www.in= foq.com/articles/ApacheAvro#_ftnref6_7758) where the author had to writ= e some nasty String.replace(..) code to combine schemas, but there's go= t to be a better way that this.

We nee= d to improve the ability to import multiple files when parsing. =A0Using th= e lower level Avro API you can parse the files yourself in an order that wi= ll work. =A0
I have simply put all my types in one file. =A0If you made one avsc fi= le with both Position and Player in a JSON array it will complie. =A0It wou= ld look like:
[
=A0 < position schema here>,
=A0 < player schema here>
]

Yes, I've used this approach in the past. Initially I was= thinking that I could write something to combine multiple files into a=20 single InputStream facade that generates a union like you describe,=20 which could then be parsed. I could then hold a handle to the union schema = and provide a method to get a given scheme type (i.e. the Player) by name. = This is better than the String replace(..) approach, but still a bit hacky.=

=A0Using the = lower level Avro API you can parse the files yourself in an order that will= work.
=A0
How exactly would the approach work where you parse file= s in reverse-dependency order work? This is something I'd like to explo= re and maybe contribute a helper for. I've tried a few combinations of = this approach to no avail:

=A0=A0=A0=A0=A0=A0=A0 Schema schema1 =3D Schema.parse(new File("ex= amples/java/avro/position.avsc"));
=A0=A0=A0=A0=A0=A0=A0 Schema sch= ema2 =3D schema1.parse(new File("examples/java/avro/player.avsc")= );


=A0


Also FYI, it seems enum values can= 9;t start with numbers (i.e. '1B'). Is this a know issue or a featu= re? I haven't seen it documented anywhere. You get an error like this i= f the value starts with a number:

org.apache.avro.SchemaParseException: Illegal initial c= haracter


Enums are a named type. =A0The enum names must start= with [A-Za-z_] =A0and subsequently contain only [A-Za-z0-9_].

I hadn't noticed that before, thank= s.
=A0

However, the spec does not say t= hat the values must have such restrictions. =A0This may be a bug, can you f= ile a JIRA ticket?

Thanks!

-Scott


thanks,
Bill


--20cf300513a6e91cd704aa18a6a3--