avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amihay Zer-Kavod <amih...@gmail.com>
Subject Re: Enum & backward compatibility in distributed services...
Date Tue, 28 Jan 2014 17:43:50 GMT
Fantastic.
I admit, my main concern was the inconsistency in handling Avro types and
Avro enums in specificRecod, this approach fixes it beautifully and then
some, we win:

   - As you said: Compile-time type-checking of specific, but the
   run-time flexibility of generic
   - We gain consistent behavior for types and enums, which is missing
   today (I can add a new field but not a new enum without breaking
   backward/forward compatibility in Specific)
   - Backward compatibility for static languages using Specific (or Flex)
   - Decoupling of runtime data and logical view - allow optimization as
   well as other things decoupling gives us :)

All are extremely valuable in a multi service distributed system. but also
good for complex systems in general.

Doing the schema hybrid and setting default values on fields is a good
idea. However setting default values for Enums would return a default value
which overrides the original value (which exists, only not in the schema).
I find accessing the original Value very useful, for example for debug/log
purposes and I would guess there are many other good use cases for
accessing the original value.
So maybe add the "default" to the enum and return it in case of unknown
enum value but also add a "getOriginalSymbol" to GenericData.EnumSymbol, to
fetch the original data from the enum. so with you example it would look
like this:

{"type":"enum", "name":"Color", "symbols":["UNKNOWN","RED", "GREEN",
"BLUE"], "default": "UNKNOWN" }

Code would look like this:

public void foo(Shape shape) {
  Color c = shape.getColor();
  If (c.equals(c.UNKNOWN)) {
      printNewColor(c.getOrignialSymbol());
  }
}

We can also define a predefined "UNKNOWN" enum type for all avro enums
automatically, allowing default fallback to this value in these cases. This
is probably less elegant though, but default and unknown are actually two
different use cases. Default is for scenarios where I did not get the data,
and unknown is in cases I do not know how to handle the data.

Bottom line, I would go with Flex approach and retire the Specific
approach entirely.

Much appreciated



On Mon, Jan 27, 2014 at 8:03 PM, Doug Cutting <cutting@apache.org> wrote:

> You'd like the compile-time type-checking of specific, but the
> run-time flexibility of generic, right?  Here's a way we might achieve
> this.
>
> Given the following schemas:
>
> {"type":"enum", "name":"Color", "symbols":["RED", "GREEN", "BLUE"]}
>
> {"type":"record", "name":"Shape", "fields":[
>   {"name":"xPosition", "type":"int"},
>   {"name":"yPosition", "type":"int"},
>   {"name":"color", "type":"Color"},
>   ]}
>
> We might generate Java code like:
>
> public class Shape extends GenericData.Record {
>   public Shape(Schema schema) { super(schema); }
>   public int getXPosition() { return (Number)get("xPosition"); }
>   public int getYPosition() { return (Number)get("yPosition"); }
>   public Color getColor { return (Color)get("color"); }
> }
>
> public class Color extends GenericData.EnumSymbol {
>   public Color(Schema schema, String label) {
>     super(schema, label);
>   }
>   public static final Color RED = new Color("RED");
>   public static final Color GREEN = new Color("GREEN");
>   public static final Color BLUE = new Color("BLUE");
> }
>
> If one reads data using the writer's schema into such classes, then
> missing fields and enum symbols would be preserved in the generic
> representation.  For example, you might have a filtering mapper that
> removes all red shapes:
>
> public void map(Shape shape, ...) {
>   if (!shape.getColor().equals(Color.RED)) {
>     collect shape;
>   }
> }
>
> This would still function correctly without recompilation even if the
> schema of the input data is very different, e.g., missing "xPosition"
> and "yPosition", containing a new color, PURPLE or a new field,
> "region", etc.
>
> I think Christophe Taton once requested something like this, to permit
> one to preserve fields not in the schema used to generate the code
> that's reading.  An interesting variation would read things using a
> union of the writer's schema and the schema used for code generation,
> so that missing fields are given default values.
>
> The actual implementation should probably generate interfaces that
> extend the GenericRecord and GenericEnumSymbol interfaces, with
> private concrete implementations like the above, and a builder.  This
> would permit greater flexibility and optimizations.  One could, e.g.,
> when a builder is created, generate, compile and load optimized record
> implementations so that little performance penalty is paid.
>
> The end result would be that compiled code would reference interfaces
> that don't correspond exactly to the runtime data, but rather provide
> a view on that data.  We might not alter specific, but instead add a
> new FlexData, FlexDatumReader, etc., that builds on generic.
>
> Thoughts?
>
> Doug
>
>
> On Sun, Jan 26, 2014 at 2:31 AM, Amihay Zer-Kavod <amihayz@gmail.com>
> wrote:
> > Hi,
> > We are using Avro heavily for schema definition of all of the events sent
> > through our distributed system.
> > The system is a multi service, java based, SaaS system, where the
> services
> > upgraded a lot and in no particular order.
> > We are using Enums in some events data and from time to time a new Enum
> > value is added.
> > In this case we started having problems.
> > A producer produce an event with the new enum value, A consumer using old
> > schema tries to read the event using java SpecificDatumReader will
> > completely fail to read the event .
> > These events will not be handled by the consumer until it is upgraded to
> use
> > the new schema generated code.
> >
> > Problem is Avro code generation creates a real java enum, and there is no
> > way to initialize or represent an unknown enum value in a java enum.
> > However in many cases the consumer could still be doing most of its logic
> > with the event with unknown enum value.
> >
> > Handling enums in Avro is a powerful tool, specificDatumReader is a
> powerful
> > tool, it looks like I'd have to give up usage of one of them!
> >
> > Is there any plan/way to handle enums differently in the code generation?
> > Any other ideas I can fix this issue with?
> > I believe AVRO-1340 reference the same problem, any plans on doing it?
> > I would go a step further and allow dynamic access to the original value,
> > not just a default value in case enum value is unknown.
> >
> > 10x
> > Amihay
> >
> >
> >
>

Mime
View raw message