Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 522FA9119 for ; Wed, 30 May 2012 21:15:04 +0000 (UTC) Received: (qmail 35522 invoked by uid 500); 30 May 2012 21:15:04 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 35466 invoked by uid 500); 30 May 2012 21:15:04 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 35457 invoked by uid 99); 30 May 2012 21:15:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 May 2012 21:15:04 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [66.219.59.79] (HELO c.mx.sluggardy.net) (66.219.59.79) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 May 2012 21:14:56 +0000 Received: from [10.0.1.57] (195-240-11-219.ip.telfort.nl [195.240.11.219]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by c.mx.sluggardy.net (Postfix) with ESMTPSA id A69971334003 for ; Wed, 30 May 2012 16:18:16 -0500 (CDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1257) Subject: Re: Nested schema issue (with "munged" invalid schema) From: Nick Palmer In-Reply-To: <26778166c0224219b49c48094fe4d397@PEXHB012B.vu.local> Date: Wed, 30 May 2012 23:14:28 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <32F0970E-BFE7-40C4-BDD2-A172CD894B3A@cs.vu.nl> References: <26778166c0224219b49c48094fe4d397@PEXHB012B.vu.local> To: "user@avro.apache.org" X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org You cannot define the same type twice within the same schema so you need = to change your "munge" step to produce the following: { "name": "address2", "type": "record", "namespace" : "some.domain", "fields" :=20 [ { "name": "street",=20 "type": "string" }, { "name": "city",=20 "type": "string" }, { "name": "position1", "type": = {"type":"record","name":"location","namespace":"some.domain","fields":[{"n= ame":"latitude","type":"float"},{"name":"longitude","type":"float"}]} }, { "name": "position2", "type": "some.domain.location" } ] } ~ Nick On May 1, 2012, at 6:55 PM, Peter Cameron wrote: > I'm having a problem with nesting schemas. A very brief overview of = why we're using Avro (successfully so far) is:=20 >=20 > o code generation not required=20 > o small binary format=20 > o dynamic use of schemas at runtime=20 >=20 > We're doing a flavour of RPC, and the reason we're not using Avro's = IDL and flavour of RPC is because the endpoint is not necessarily a Java = platform (C# and Java for our purposes), and only the Java = implementation of Avro has RPC. Hence no Avro RPC for us.=20 >=20 > I'm aware that Avro doesn't import nested schemas out of the box. We = need that functionality as we're exposed to schemas over which we have = no control, and in the interests of maintainability, these schemas are = nicely partitioned and are referenced as types from within other = schemas. So, for example, a address schema refers to a = some.domain.location object by having a field of type = "some.domain.location". Note that our runtime has no knowledge of any = some.domain package (e.g. address or location objects). Only the = endpoints know about some.domain. (A layer at our endpoint runtime = serialises any unknown i.e. non-primitive objects as bytestreams.)=20 >=20 > I implemented a schema cache which intelligently imports schemas on = the fly, so adding the address schema to the cache, automatically adds = the location schema that it refers to. The cache uses Avro's schema to = parse an added schema, catches parse exceptions, looks at the exception = message to see whether or not the error is due to a missing or undefined = type, and thus goes off to import the needed schema. Brittle, I know, = but no other way for us. We need this functionality, and nothing else = comes close to Avro.=20 >=20 > So far so good, until today when I hit a corner case.=20 >=20 > Say I have an address object that has two fields, called position1 and = position2. If position1 and position2 are non-primitive types, then the = address schema doesn't parse so presumably is an invalid Avro schema. = The error concerns redefining the location type. Here's the example:=20 >=20 > location schema=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=20 >=20 > {=20 > "name": "location",=20 > "type": "record",=20 > "namespace" : "some.domain",=20 > "fields" :=20 > [=20 > {=20 > "name": "latitude",=20 > "type": "float"=20 > },=20 > {=20 > "name": "longitude",=20 > "type": "float"=20 > }=20 > ]=20 > }=20 >=20 > address schema=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=20 >=20 > {=20 > "name": "address",=20 > "type": "record",=20 > "namespace" : "some.domain",=20 > "fields" :=20 > [=20 > {=20 > "name": "street",=20 > "type": "string"=20 > },=20 > {=20 > "name": "city",=20 > "type": "string"=20 > },=20 > {=20 > "name": "position1",=20 > "type": "some.domain.location"=20 > },=20 > {=20 > "name": "position2",=20 > "type": "some.domain.location"=20 > }=20 > ]=20 > }=20 >=20 >=20 > Now, an answer of having a list of positions as a field is not an = answer for us, as we need to solve the general issue of a schema with = more than one instance of the same nested type i.e. my problem is not = with an address or location schema. >=20 > The problematic schema constructed by my schema cache is: >=20 > { > "name": "address2", > "type": "record", > "namespace" : "some.domain", > "fields" :=20 > [ > { > "name": "street",=20 > "type": "string" > }, > { > "name": "city",=20 > "type": "string" > }, > { > "name": "position1", > "type": = {"type":"record","name":"location","namespace":"some.domain","fields":[{"n= ame":"latitude","type":"float"},{"name":"longitude","type":"float"}]} > }, > { > "name": "position2", > "type": = {"type":"record","name":"location","namespace":"some.domain","fields":[{"n= ame":"latitude","type":"float"},{"name":"longitude","type":"float"}]} > } > ] > } >=20 >=20 > Can this be done? This is potentially a blocker for us.=20 >=20 > cheers,=20 > Peter=20 >=20