avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christophe Taton <ta...@wibidata.com>
Subject Re: Record extensions?
Date Wed, 13 Jun 2012 01:09:33 GMT
On Tue, Jun 12, 2012 at 11:13 AM, Doug Cutting <cutting@apache.org> wrote:

> On Tue, Jun 12, 2012 at 10:38 AM, Christophe Taton <taton@wibidata.com>
> wrote:
> > I need my server to handle records with fields that can be "freely"
> extended
> > by users, without requiring a recompile and restart of the server.
> > The server itself does not need to know how to handle the content of this
> > extensible field.
> >
> > One way to achieve this is to have a bytes field whose content is managed
> > externally, but this is very ineffective in many ways.
> > Is there a another way to do this with Avro?
> You could use a very generic schema, like:
> {"type":"record", "name":"Value", fields: [
>  {"name":"value", "type": ["int","float","boolean", ...
> {"type":"map", "values":"Value"}}
> ]}
> This is roughly equivalent to a binary encoding of JSON.  But by using
> a map it forces the serialization of a field name with every field
> value.  Not only does that make payloads bigger but it also makes them
> slower to construct and parse.
> Another approach is to include the Avro schema for a value in the record,
> e.g.:
> {"type":"record", "name":"Extensions", fields: [
>  {"name":"schema", type: "string"},
>  {"name":"values", "type": {"type":"array", "items":"bytes"}}
> ]}
> This can make things more compact when there are a lot of values.  For
> example, this might be used in a search application where each query
> lists the fields its interested in retrieving and each response
> contains a list of records that match the query and contain just the
> requested fields.  The field names are not included in each match, but
> instead once for entire set of matches, making this faster and more
> compact.
> Finally, if you have a stateful connection then you can send send a
> schema in the first request then just send bytes encoding instances of
> that schema in subsequent requests over that connection.  This again
> avoids sending field names with each field value.

Thanks for the detailed reply!

In practice, I have a bunch of independent records, each of them carrying
at most one "extension field".

I was especially hoping there would be a way to avoid serializing an
"extension" record twice (once from the record object into a bytes field,
and then a second time as a bytes field into the destination output
stream). Ideally, such an extension field should not require its content to
be bytes, but should accept any record object, so that it is encoded only
As I understand it, Avro does not allow me to do this right now. Is this


View raw message