avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Palmer <pal...@cs.vu.nl>
Subject Re: Nested schema issue (with "munged" invalid schema)
Date Wed, 30 May 2012 21:14:28 GMT
You cannot define the same type twice within the same schema so you need to change your "munge"
step to produce the following:

{
    "name": "address2",
    "type": "record",
    "namespace" : "some.domain",
    "fields" : 
    [
        {
            "name": "street", 
            "type": "string"
        },
        {
            "name": "city", 
            "type": "string"
        },
        {
            "name": "position1",
            "type": {"type":"record","name":"location","namespace":"some.domain","fields":[{"name":"latitude","type":"float"},{"name":"longitude","type":"float"}]}
        },
        {
            "name": "position2",
            "type": "some.domain.location"
        }
    ]
}

~ Nick

On May 1, 2012, at 6:55 PM, Peter Cameron wrote:

> I'm having a problem with nesting schemas. A very brief overview of why we're using Avro
(successfully so far) is: 
> 
> o code generation not required 
> o small binary format 
> o dynamic use of schemas at runtime 
> 
> We're doing a flavour of RPC, and the reason we're not using Avro's IDL and flavour of
RPC is because the endpoint is not necessarily a Java platform (C# and Java for our purposes),
and only the Java implementation of Avro has RPC. Hence no Avro RPC for us. 
> 
> I'm aware that Avro doesn't import nested schemas out of the box. We need that functionality
as we're exposed to schemas over which we have no control, and in the interests of maintainability,
these schemas are nicely partitioned and are referenced as types from within other schemas.
So, for example, a address schema refers to a some.domain.location object by having a field
of type "some.domain.location". Note that our runtime has no knowledge of any some.domain
package (e.g. address or location objects). Only the endpoints know about some.domain. (A
layer at our endpoint runtime serialises any unknown i.e. non-primitive objects as bytestreams.)

> 
> I implemented a schema cache which intelligently imports schemas on the fly, so adding
the address schema to the cache, automatically adds the location schema that it refers to.
The cache uses Avro's schema to parse an added schema, catches parse exceptions, looks at
the exception message to see whether or not the error is due to a missing or undefined type,
and thus goes off to import the needed schema. Brittle, I know, but no other way for us. We
need this functionality, and nothing else comes close to Avro. 
> 
> So far so good, until today when I hit a corner case. 
> 
> Say I have an address object that has two fields, called position1 and position2. If
position1 and position2 are non-primitive types, then the address schema doesn't parse so
presumably is an invalid Avro schema. The error concerns redefining the location type. Here's
the example: 
> 
> location schema 
> ============== 
> 
> { 
>     "name": "location", 
>     "type": "record", 
>     "namespace" : "some.domain", 
>     "fields" : 
>     [ 
>         { 
>             "name": "latitude", 
>             "type": "float" 
>         }, 
>         { 
>             "name": "longitude", 
>             "type": "float" 
>         } 
>     ] 
> } 
> 
> address schema 
> ============== 
> 
> { 
>     "name": "address", 
>     "type": "record", 
>     "namespace" : "some.domain", 
>     "fields" : 
>     [ 
>         { 
>             "name": "street", 
>             "type": "string" 
>         }, 
>         { 
>             "name": "city", 
>             "type": "string" 
>         }, 
>         { 
>             "name": "position1", 
>             "type": "some.domain.location" 
>         }, 
>         { 
>             "name": "position2", 
>             "type": "some.domain.location" 
>         } 
>     ] 
> } 
> 
> 
> Now, an answer of having a list of positions as a field is not an answer for us, as we
need to solve the general issue of a schema with more than one instance of the same nested
type i.e. my problem is not with an address or location schema.
> 
> The problematic schema constructed by my schema cache is:
> 
> {
>     "name": "address2",
>     "type": "record",
>     "namespace" : "some.domain",
>     "fields" : 
>     [
>         {
>             "name": "street", 
>             "type": "string"
>         },
>         {
>             "name": "city", 
>             "type": "string"
>         },
>         {
>             "name": "position1",
>             "type": {"type":"record","name":"location","namespace":"some.domain","fields":[{"name":"latitude","type":"float"},{"name":"longitude","type":"float"}]}
>         },
>         {
>             "name": "position2",
>             "type": {"type":"record","name":"location","namespace":"some.domain","fields":[{"name":"latitude","type":"float"},{"name":"longitude","type":"float"}]}
>         }
>     ]
> }
> 
> 
> Can this be done? This is potentially a blocker for us. 
> 
> cheers, 
> Peter 
> 


Mime
View raw message