avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Laws <clawsi...@gmail.com>
Subject Re: Schema evolution and projection
Date Thu, 28 Feb 2013 21:13:54 GMT
Thanks for the informative reply. I look forward to the example code, that
is exactly what I'm after.

I'm really struggling with my schema evolution testing. I thought I'd post
a question about schema projection because it seemed simpler but I guess it
also rests on creating a resolver. I have not found a clear and simple
example of how to do it using avro-c. I've trawled the test code for
examples but as I mention I can't find a clear and simple example.

I realise that the majority of Avro usage appears to be in Java however I
need to use Avro-c for my assessment of Avro because a large portion of our
system uses C.

Thanks for your help.

On Fri, Mar 1, 2013 at 7:31 AM, Douglas Creager <douglas@creagertino.net>wrote:

> > There doesn't seem to be much information available on how to perform
> > these tasks. The examples on the C API page confusingly mix the old
> > datum API with the new value API.
> Apologies for that — you're absolutely right that we need to clean up
> the C API documentation a bit.
> > Is this how schema projection is supposed to work? Does it just return
> > items of the same type irrespective of the field name specified?
> tl;dr — The schema projection doesn't happen for free; you need to use a
> "resolved writer" to perform the schema resolution.
> In the C API, when you open an Avro file for reading, we expect that the
> avro_value_t that you pass in to avro_file_reader_read_value has the
> *exact same* schema that was used to create the file.  So in your first
> example (gist 5056626), your read_archive_test function works great
> since it's explicitly asking the file for the writer schema, and using
> that to create the value instance to read into.  If you know that you
> want to read exactly what's in the file, not perform any schema
> resolution, and (optionally) dynamically interrogate the writer schema
> to see what fields are available, this is exactly the right approach.
> On the other hand, if you want to use schema resolution to project away
> some of the fields (or to do other interesting data conversions), you
> need to create a resolved writer to perform that schema resolution.  The
> resolved writer is an avro_value_iface_t that wraps up the schema
> resolution rules for a particular writer schema and reader schema.  When
> you create an avro_value_t instance of the resolved writer, it looks
> like it's an instance of the writer schema, and it wraps an instance of
> the reader schema.  Since the resolved writer value is an instance of
> the writer schema, you can read data into it using
> avro_file_reader_read_value.  Under the covers, it will perform the
> schema resolution and fill in the wrapped reader schema instance.  You
> can then read the projected data out of your reader value.
> In English that's probably still a bit too dense of an explanation; I'll
> whip together an example program and post it as a gist so that you can
> see it in actual code.
> (As an aside, the reason original projection_test worked the way that it
> did is because a single "record { int, int }" value happens to have the
> same serialization as two consecutive "int" values.
> avro_file_reader_read_value doesn't do any schema resolution, it just
> tries to read a value of the type that you pass in.)
> cheers
> –doug

View raw message