avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stu Hood <stuh...@gmail.com>
Subject Re: Schema registry
Date Sun, 10 Apr 2011 00:40:20 GMT
> This full schema should either go with the data (data files) or in a
registry (e.g. HAvroBase).

Isn't the latter what they want? A registry?

Presumably the RPC framework implements such a registry, since it can look
schemas up by their hashcode.

On Thu, Mar 24, 2011 at 2:04 PM, Scott Carey <scott@richrelevance.com>wrote:

> There is danger in this.
>
> What is the schema used for in this case?  There are three common reasons
> for assembling a schema:
> 1.  Assembling the schema that represents the format of the data to be
> written.
> 2.  Assembling the schema that represents the way a reader wishes to view
> the data. (a.k.a. 'reader' or 'expected' schema).
> 3.  Assembling the schema that represents the way that some data was
> persisted.
>
> If you are persisting data, you should persist the _entire_ schema used to
> write that data as well.  This full schema should either go with the data
> (data files) or in a registry (e.g. HAvroBase).  A schema name reference
> is not sufficient -- you lose the ability to evolve the referenced schema.
>
> What if the version of the nested schema has changed?  Now you have a data
> file that refers to a nested schema by name "com.navteq.avro.FacebookUser"
> and finds a schema with that name through some resolution mechanism.  If
> that resolution mechanism is not version-aware, you're in trouble.
>
> So for #3, assembling schema fragments by reference is dangerous and
> complicated.
> Making the resolution mechanism version aware is problematic but doable.
> You can manually version every schema with a number, and use that, but
> then you are manually versioning schemas and storing the version meta-data
> in the schemas.
>
> Avro by nature versions schemas by equivalence.  The natural way to encode
> a schema version is to write the schema itself.
>
> In short: Any such registry would have to be version-aware if it is used
> to assemble schemas for use case #3 above, and the schemas that refer to
> these versions would also have to be version-aware.  It is much simpler to
> just embed the schemas.
>
> Use cases #1 and #2 above are essentially the assembly of the 'current'
> schema version, and a registry could work.  Avro does not have many
> built-in tools for this.  Generally, avsc, avpr, or avdl files are used as
> schema source for 'schema first' design, and 'code first' design persists
> the current schema in the code.
> avdl files support includes, avsc and avpr are more primitive.
>
>
> On 3/23/11 10:21 PM, "Ashish Shinde" <ashish@strandls.com> wrote:
>
> >Hi,
> >
> >My use case is very similar to the nested schema in
> >the test case AvroUtilsTest on http://www.infoq.com/articles/ApacheAvro
> >
> >The only difference is I would like to automatically load schema's from
> >resources in classpath and also automatically load schema's
> >for nested types.
> >
> >If you look at the test example mentioned above if I ask the
> >"AvroSchemaRegistry" for a schema named
> >com.navteq.avro.FacebookSpecialUser it should also load the nested
> >com.navteq.avro.FacebookUser schema using some resolving and loading
> >mechanism.
> >
> >Thanks and regards,
> >- Ashish
> >
> >
> >
> >On Thu, 24 Mar 2011 10:38:20 +0800
> >Felix Xu <ygnhzeus@gmail.com> wrote:
> >
> >> Hi,I'm not quite understand the question..
> >> Can you give an example of your schema?
> >>
> >> 2011/3/24 <ashish@strandls.com>
> >>
> >> > Hi,
> >> >
> >> > Is there some java implementation of Avro schema registry? The use
> >> > case is to have separate schema data files for a bunch of types and
> >> > be able to resolve nested types.
> >> >
> >> > I tried avro for the first time and could not have schema parsed
> >> > from one file have a nested record from a schema described in a
> >> > second file.
> >> >
> >> > I am using a modified version of the AvroUtil class from
> >> > http://www.infoq.com/articles/ApacheAvro . The modified file is
> >> > attached. I uses the SchemaParse exception and loads schema files
> >> > from classpath.
> >> >
> >> > Is there a better alternative. If this is a strong use case I could
> >> > work on creating such a schema registry with plugable resolvers and
> >> > loaders.
> >> >
> >> > Thanks and regards,
> >> >  - Ashish
> >> >
> >
>
>

Mime
View raw message