avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: schema by reference
Date Tue, 06 Dec 2011 20:36:17 GMT
Are you talking about RPC?  Earlier you said, "messages would be smaller
in size when we store large numbers of them", which led me to think
you're talking about some sort of data store.

If you're talking about RPC then there's already a reference passed, the
MD5 sum of the protocol text.  The client and/or server could maintain a
persistent database of these so that the text need never be transmitted.
 If that's not appropriate then one could devise a different RPC
mechanism that instead uses, e.g., URLs.  Perhaps these could be
included in the handshake metadata of the existing RPC mechanism, as an
extension.

If you're talking about file-based storage, then Avro's data file format
already factors out the schema.  If you're talking about some other sort
of storage, then I'm not sure what modifications to Avro would be
required to support this.

Doug

On 12/06/2011 12:10 PM, Neil Davudo wrote:
> It would be nice if the Avro has a way for the message to carry the URL of the schema,
much like it can carry the schema within it. We could pass it separately out of band (e.g.
header) but that reduces the strength of the link between the message and the URL of the schema.
> 
> Any thoughts on supporting this?
> 
> Neil
> 
> ----- Original Message -----
> From: Doug Cutting <cutting@apache.org>
> To: user@avro.apache.org
> Cc: 
> Sent: Tuesday, December 6, 2011 1:48 PM
> Subject: Re: schema by reference
> 
> On 12/06/2011 11:14 AM, Neil Davudo wrote:
>> Yes, by a URL. Messages would be smaller in size when we store large numbers of them,
and we can always get the schema using the reference if necessary. Similar to what we can
do with WSDL having a reference to the XSD.
> 
> This is a reasonable thing to do.
> 
> A schema can easily be constructed from a URL with:
> 
> Schema.parse(url.openStream())
> 
> although one would probably want a cache in front of this.
> 
> Note that in Avro one one must ensure that the version of the schema at
> the reference does not change, that it is identical to the version used
> to write the datum.  So one should not probably not use a logical URL
> for a datatype like http://me.com/schemas/FooRecord but rather a unique
> ID like http://me.com/schemas/9fd73.
> 
> If you're using a database (e.g., HBase) then you can have a table that
> of schemas, then, in other tables, store values annotated with the key
> of the entry in the schema table.  https://github.com/spullara/havrobase
> is one example of such an approach.
> 
> Or one might use a URL shortener for this, e.g.:
> 
> http://tinyurl.com/8a4rppd
> 
> redirects to
> 
> avro:///?{"type":"record","name":"foo","fields":[]}
> 
> One could then install a URL handler for "avro" URLs that resolves them
> to their query string.
> 
> Doug
> 

Mime
View raw message