avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Kleppmann <mkleppm...@linkedin.com>
Subject Re: Schema not getting saved along with Data
Date Mon, 31 Mar 2014 15:49:30 GMT
Hi Lewis,

On 26 Mar 2014, at 14:34, Lewis John Mcgibbney <lewis.mcgibbney@gmail.com<mailto:lewis.mcgibbney@gmail.com>>
What actually happens with the Avro Schema in Gora is that it is permanently included in the
generated data bean. This way you know the Schema when you read your data. You can see an
example here


public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"WebPage\",...
blah blah blah

i would therefore question a justification as to why you _need_ to store the Schema with the

Say you make a change to the schema. Your database now contains some records that were written
before the schema change (i.e. encoded with schema v1) and some records that were written
afterwards (encoded with schema v2). Ideally, an application should be able to read them all
transparently and not have to care which schema version is used in the underlying store.

In Avro, schema evolution takes care of this. However, in order to handle evolution correctly,
the process reading the data from the database needs to know two schemas:

1. the schema that the client is expecting to see, usually the latest version of the schema
(the "reader's schema"),
2. the schema with which the data was originally written, which may be an older version (the
"writer's schema").

The schema that is included in the generated code covers 1., but in order to have 2. you need
to either store the writer's schema long with the data, or some kind of fingerprint or version
of the writer's schema.

How does Gora handle this? I looked through the website but couldn't find a clear answer.


View raw message