avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Schema not getting saved along with Data
Date Tue, 01 Apr 2014 10:12:37 GMT
Hi Martin,
Thanks for reply.
On Mon, Mar 31, 2014 at 4:49 PM, Martin Kleppmann
<mkleppmann@linkedin.com>wrote:

>
> Say you make a change to the schema. Your database now contains some
> records that were written before the schema change (i.e. encoded with
> schema v1) and some records that were written afterwards (encoded with
> schema v2). Ideally, an application should be able to read them all
> transparently and not have to care which schema version is used in the
> underlying store.
>

Absolutely.


> How does Gora handle this? I looked through the website but couldn't find
> a clear answer.
>
>
> Right now we maintain only the Writer's schema, which as I mentioned is
appended within the generated Persistent Java bean. In my own experience
(and as you've hinted at :) ) this had/has caused us problems in the past.
For example we added a new (pretty innocent) string Field 'batchId' to our
WebPage Schema [0] over in Nutch meaning that new Records being written
included it and older records already within the data set did not.
{"name": "batchId", "type": "string"}
This inevitably threw NPE when certain Tools attempted to access certain
records which the batchId Field and value was absent.
So taking a bit of advice from a well recognized voice in this area (uh hum
;)) "If you're storing records in a database one-by-one, you may end up
with different schema versions written at different times, and so you have
to annotate each record with its schema version. If storing the schema
itself is too much overhead, you can use a hash of the schema, or a
sequential schema version number. You then need a schema registry where you
can look up the exact schema definition for a given version number."
Fortunately in the above example this particular Schema has only changed
once in some 2 or 3 years. However it HAS changed.
Looks like I am also taking a lesson from this thread and we have a bit
more work to do on Gora to address the above points. This is of course
unless I have missed something!

[0]
https://svn.apache.org/repos/asf/nutch/branches/2.x/src/gora/webpage.avsc

Mime
View raw message