From Scott Carey <sc...@richrelevance.com>
Subject Re: Schema registry
Date Thu, 24 Mar 2011 21:04:10 GMT
There is danger in this.

What is the schema used for in this case?  There are three common reasons
for assembling a schema:
1.  Assembling the schema that represents the format of the data to be
2.  Assembling the schema that represents the way a reader wishes to view
the data. (a.k.a. 'reader' or 'expected' schema).
3.  Assembling the schema that represents the way that some data was

If you are persisting data, you should persist the _entire_ schema used to
write that data as well.  This full schema should either go with the data
(data files) or in a registry (e.g. HAvroBase).  A schema name reference
is not sufficient -- you lose the ability to evolve the referenced schema.

What if the version of the nested schema has changed?  Now you have a data
file that refers to a nested schema by name "com.navteq.avro.FacebookUser"
and finds a schema with that name through some resolution mechanism.  If
that resolution mechanism is not version-aware, you're in trouble.

So for #3, assembling schema fragments by reference is dangerous and
Making the resolution mechanism version aware is problematic but doable.
You can manually version every schema with a number, and use that, but
then you are manually versioning schemas and storing the version meta-data
in the schemas.

Avro by nature versions schemas by equivalence.  The natural way to encode
a schema version is to write the schema itself.

In short: Any such registry would have to be version-aware if it is used
to assemble schemas for use case #3 above, and the schemas that refer to
these versions would also have to be version-aware.  It is much simpler to
just embed the schemas.

Use cases #1 and #2 above are essentially the assembly of the 'current'
schema version, and a registry could work.  Avro does not have many
built-in tools for this.  Generally, avsc, avpr, or avdl files are used as
schema source for 'schema first' design, and 'code first' design persists
the current schema in the code.
avdl files support includes, avsc and avpr are more primitive.

On 3/23/11 10:21 PM, "Ashish Shinde" <ashish@strandls.com> wrote:

>My use case is very similar to the nested schema in
>the test case AvroUtilsTest on http://www.infoq.com/articles/ApacheAvro
>The only difference is I would like to automatically load schema's from
>resources in classpath and also automatically load schema's
>for nested types.
>If you look at the test example mentioned above if I ask the
>"AvroSchemaRegistry" for a schema named
>com.navteq.avro.FacebookSpecialUser it should also load the nested
>com.navteq.avro.FacebookUser schema using some resolving and loading
>Thanks and regards,
>- Ashish
>On Thu, 24 Mar 2011 10:38:20 +0800
>Felix Xu <ygnhzeus@gmail.com> wrote:
>> Hi,I'm not quite understand the question..
>> Can you give an example of your schema?
>> 2011/3/24 <ashish@strandls.com>
>> > Hi,
>> >
>> > Is there some java implementation of Avro schema registry? The use
>> > case is to have separate schema data files for a bunch of types and
>> > be able to resolve nested types.
>> >
>> > I tried avro for the first time and could not have schema parsed
>> > from one file have a nested record from a schema described in a
>> > second file.
>> >
>> > I am using a modified version of the AvroUtil class from
>> > http://www.infoq.com/articles/ApacheAvro . The modified file is
>> > attached. I uses the SchemaParse exception and loads schema files
>> > from classpath.
>> >
>> > Is there a better alternative. If this is a strong use case I could
>> > work on creating such a schema registry with plugable resolvers and
>> > loaders.
>> >
>> > Thanks and regards,
>> >  - Ashish
>> >

