avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject Re: [IDEA] Making schema evolution for enums slightly easier.
Date Tue, 31 Jan 2017 16:57:08 GMT
If you want to solve this problem by using a String to encode the value,
then you can do that by defining a logical type that is an enum-as-string.
But I'm not sure you want to do that. The nice thing about an enum is that
you use what you know about the schema ahead of time to get a much more
compact representation -- usually a byte rather than encoding the entire
string. So I'd much rather find a way of handling this case that keeps the
compact representation, while allowing for applications to gracefully
handling these.

For generic, enum symbols are translated to GenericEnumSymbol, which can
hold any symbol. Adding an option to return the symbol from the writer's
schema even if it isn't in the reader's schema is one way around the
problem. That wouldn't work for reflect or specific, though.

Another option that was suggested last year is to designate a catch-all
enum symbol. So your enum would be { 'A', 'B', 'UNKNOWN' } and { 'A', 'B',
'C', 'UNKNOWN' }. When a v1 consumer reads v2 records, C gets turned into

I like the designated catch-all symbol because it is a reasonable way to
opt-in for forward-compatibility.


On Tue, Jan 31, 2017 at 2:04 AM, Niels Basjes <Niels@basjes.nl> wrote:

> Hi,
> I'm working on a project where we are putting message serialized avro
> records into Kafka. The schemas are made available via a schema registry of
> some sorts.
> Because Kafka stores the messages for a longer period 'weeks' we have two
> common scenarios that occur when a new version of the schema is introduced
> (i.e. from V1 to V2).
> 1) A V2 producer is released and a V1 consumer must be able to read the
> records.
> 2) A 'new' V2 consumer is released a few days after the V2 producer started
> creating records. The V2 consumer starts reading Kafka "from the beginning"
> and as a consequence first has to go through a set of V1 records.
> So in this usecase we need schema evolution in two directions.
> To make sure it all works as expected I did some experiments and found that
> these requirements are all doable except when you are in need of an enum.
> This 'two directions' turns out to have a problem with changing the values
> of an enum.
> You cannot write an enum { 'A', 'B', 'C' } and then read it with the schema
> enum { 'A', 'B' }
> So I was thinking about a possible way to make this easier for the
> developer.
> The current idea that I want your opinion on:
> 1) In the IDL we add a way of directing that we want the enum to be stored
> in a different way in the schema. I was thinking about something like
> either defining a new type like 'string enum' or perhaps use an annotation
> of some sorts.
> 2) The 'string enum' is mapped into the actual schema as a string (which
> can contain ANY value). So anyone using the json schema can simply read it
> because it is a string.
> 3) The generated code that is used to set/change the value enforces that
> only the allowed values can be set.
> This way a 'reader' can read any value, the schema is compatible in all
> directions.
> What do you guys think?
> Is this an idea worth trying out?
> --
> Best regards / Met vriendelijke groeten,
> Niels Basjes

Ryan Blue
Software Engineer

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message