avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Enum & backward compatibility in distributed services...
Date Mon, 27 Jan 2014 18:03:48 GMT
You'd like the compile-time type-checking of specific, but the
run-time flexibility of generic, right?  Here's a way we might achieve

Given the following schemas:

{"type":"enum", "name":"Color", "symbols":["RED", "GREEN", "BLUE"]}

{"type":"record", "name":"Shape", "fields":[
  {"name":"xPosition", "type":"int"},
  {"name":"yPosition", "type":"int"},
  {"name":"color", "type":"Color"},

We might generate Java code like:

public class Shape extends GenericData.Record {
  public Shape(Schema schema) { super(schema); }
  public int getXPosition() { return (Number)get("xPosition"); }
  public int getYPosition() { return (Number)get("yPosition"); }
  public Color getColor { return (Color)get("color"); }

public class Color extends GenericData.EnumSymbol {
  public Color(Schema schema, String label) {
    super(schema, label);
  public static final Color RED = new Color("RED");
  public static final Color GREEN = new Color("GREEN");
  public static final Color BLUE = new Color("BLUE");

If one reads data using the writer's schema into such classes, then
missing fields and enum symbols would be preserved in the generic
representation.  For example, you might have a filtering mapper that
removes all red shapes:

public void map(Shape shape, ...) {
  if (!shape.getColor().equals(Color.RED)) {
    collect shape;

This would still function correctly without recompilation even if the
schema of the input data is very different, e.g., missing "xPosition"
and "yPosition", containing a new color, PURPLE or a new field,
"region", etc.

I think Christophe Taton once requested something like this, to permit
one to preserve fields not in the schema used to generate the code
that's reading.  An interesting variation would read things using a
union of the writer's schema and the schema used for code generation,
so that missing fields are given default values.

The actual implementation should probably generate interfaces that
extend the GenericRecord and GenericEnumSymbol interfaces, with
private concrete implementations like the above, and a builder.  This
would permit greater flexibility and optimizations.  One could, e.g.,
when a builder is created, generate, compile and load optimized record
implementations so that little performance penalty is paid.

The end result would be that compiled code would reference interfaces
that don't correspond exactly to the runtime data, but rather provide
a view on that data.  We might not alter specific, but instead add a
new FlexData, FlexDatumReader, etc., that builds on generic.



On Sun, Jan 26, 2014 at 2:31 AM, Amihay Zer-Kavod <amihayz@gmail.com> wrote:
> Hi,
> We are using Avro heavily for schema definition of all of the events sent
> through our distributed system.
> The system is a multi service, java based, SaaS system, where the services
> upgraded a lot and in no particular order.
> We are using Enums in some events data and from time to time a new Enum
> value is added.
> In this case we started having problems.
> A producer produce an event with the new enum value, A consumer using old
> schema tries to read the event using java SpecificDatumReader will
> completely fail to read the event .
> These events will not be handled by the consumer until it is upgraded to use
> the new schema generated code.
> Problem is Avro code generation creates a real java enum, and there is no
> way to initialize or represent an unknown enum value in a java enum.
> However in many cases the consumer could still be doing most of its logic
> with the event with unknown enum value.
> Handling enums in Avro is a powerful tool, specificDatumReader is a powerful
> tool, it looks like I'd have to give up usage of one of them!
> Is there any plan/way to handle enums differently in the code generation?
> Any other ideas I can fix this issue with?
> I believe AVRO-1340 reference the same problem, any plans on doing it?
> I would go a step further and allow dynamic access to the original value,
> not just a default value in case enum value is unknown.
> 10x
> Amihay

View raw message