avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: references to other schemas
Date Mon, 03 May 2010 17:48:56 GMT

On May 3, 2010, at 10:03 AM, Doug Cutting wrote:

> Scott Carey wrote:
>> There has been talk that AvroGen would handle features like this (as well as many
others) in time.  However this is one that should probably be addressed at the JSON level
regardless of the future direction of AvroGen.
> Note that JSON schemas and protocols need to be standalone, containing 
> the full lexical closure of schemas referenced, when they are included 
> in data files and exchanged in RPC handshakes without reference to 
> external data.  Thus I am reluctant to add a JSON syntax for file 
> inclusion.  Rather, I think a pre-processor is appropriate.  The 
> pre-processor would not be run on schemas included in files or exchanged 
> in RPC handshakes, but would be run for schemas read from files.

Exactly.  I don't think we shouldn't change the JSON syntax by adding references or includes.

We should just make the SpecificCompiler capable of reading a collection of files and figuring
out how to compile them when there is not full lexical closure in a .avsc file.
File formats and RPC's have much stricter requirements than the SpecificCompiler.

> I have experimented with using the m4 pre-processor for this purpose, 
> and found it a bit awkward.  Perhaps someone can develop macros for m4 
> that make it palatable, or perhaps we can develop a custom pre-processor 
> for JSON.
> We might exploit otherwise-illegal JSON syntax, like backquotes, for 
> pre-processor directives.  An include might look something like:
> {"protocol": "org.foo.BarProtocol",
>  "types": [
>    `include org.foo.Bar`,
>     ...
>   ]
> }

Rather than use a preprocessor, Is it possible to have the SpecificCompiler search the other
files in the set for types that can't be found in the current file?  The result will be SpecificRecord
objects that have their $SCHEMA field populated with a schema that has full lexical closure.

Essentially, if given two files:
IpTypes.avsc --

[{"name": "com.somewhere.avro.IPV4", "type": "fixed", "size":4},
{"name": "com.somewhere.avro.IPV6", "type": "fixed", "size":16}]

MyRecord.avsc --

{"name": "com.somewhere.avro.MyRecord", "type": "record", "fields": [
  {"name": "hostname", "type": "string"},
  {"name": "IP", "type": [ "IPV4", "IPV6" ]}

The SpecificCompiler could compile MyRecord.avsc if concurrently given IpTypes.avsc to resolve
the "IPV4" and "IPV6" unknown references.   Perhaps it could also compile if it is aware of
a SpecificRecord Java class that has an appropriate schema.   A preprocessor would be tricky
to do this especially in a namespace-appropriate way, and would not be able to support integration
with already made SpecificRecord classes.  

Perhaps IPV4 and IPV6 are already compiled SpecificRecord classes in jar "CommonTypes.jar"
-- SpecificCompiler could run with those in its classpath and a directive to look for valid
types in its classpath in addition to the files.

The MyRecord.avsc file above does not contain a fully valid Avro schema, so perhaps we could
denote this with a different file extension.

> Also note that a protocol file (.avpr) need not actually define any 
> messages but can be used to define a set of types that reference one 
> another.  This is a stopgap, but a useful one.
> Doug

View raw message