avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject Re: Avro union compatibility mode enhancement proposal
Date Thu, 21 Apr 2016 04:09:45 GMT
Interesting idea, Doug. I was thinking about just updating the logic like
the fix in AVRO-1590 <https://issues.apache.org/jira/browse/AVRO-1590>, but
it would be nice to have a way to do this in the spec.

What do you think about dealing with this by updating the union branch
matching rules (and possibly adding them to the spec) rather than adding a
property to one of the union's schemas? Making the default a property of an
inner schema makes me think that we will have to deal with multiple schemas
with such a label at some point. Maybe it would be better to try to match
automatically by some metric like largest subset of fields projected.

rb

On Mon, Apr 18, 2016 at 2:13 PM, Doug Cutting <cutting@gmail.com> wrote:

> I was responding to the "For unions, we will add an optional catch-all
> attribute" part.  The syntax of union schemas is unfortunately hard to
> modify compatibly.
>
> Here's a way around that.  Recall that every kind of schema, except
> union, supports arbitrary properties, and unions cannot be directly
> nested.  Given this, we could have a property on one of the branch
> schemas that declares it as the branch to be used when the writer's
> branch is not found in the reader.  For example, if one wanted
> unmatched to be read as null, then the reader could use a union like,
> [{"type":"null","isDefaultBranch":true}, ... ], or one could add that
> property to a record schema that acts as a base class.  Note however
> that, when using a record as the default branch, one could not then
> use that same record as a non-default branch in another union.  To
> ameliorate that, we might permit multiple default branches in a union
> to be specified as default with the convention that the first such is
> used.
>
> Doug
>
>
> On Mon, Apr 18, 2016 at 1:18 PM, Ryan Blue <rblue@netflix.com.invalid>
> wrote:
> > Sorry, I thought you were talking about the more recent topic, the enum
> > changes. I see what you're saying now about unions.
> >
> > My initial proposal was not to change how unions are represented in the
> > schema, but to update how resolution happens. Right now, we will match a
> > read schema to a branch by structure for cases where the record has been
> > renamed. This would apply similar logic, allowing a more generic record
> to
> > match.
> >
> > I wasn't thinking there would be a schema change, though we could
> certainly
> > make that happen to standardize this behavior.
> >
> > rb
> >
> > On Mon, Apr 18, 2016 at 1:11 PM, Ryan Blue <rblue@netflix.com> wrote:
> >
> >> Doug, I don't think I understand. Why would this change a union
> >> representation?
> >>
> >> This wouldn't change the schema format, other than to add an attribute
> to
> >> enum types that is ignored by older readers. New readers will use that
> >> attribute to determine which symbol to use when the written symbol is
> >> unknown.
> >>
> >> rb
> >>
> >> On Mon, Apr 18, 2016 at 12:59 PM, Doug Cutting <cutting@gmail.com>
> wrote:
> >>
> >>> Perhaps then its sufficient to only write the new schema format when
> >>> the new attribute is specified, so existing apps will continue to
> >>> represent unions as JSON arrays?  If so, this should probably be
> >>> written into the spec.
> >>>
> >>> On Mon, Apr 18, 2016 at 12:52 PM, Ryan Blue <rblue@netflix.com.invalid
> >
> >>> wrote:
> >>> > Isn't the problem that these changes aren't compatible right now
> >>> anyway? If
> >>> > I need to add an entry to an enum right now, older readers fail when
> >>> trying
> >>> > to handle that data. This creates a way to avoid that failure in new
> >>> > versions.
> >>> >
> >>> > On Mon, Apr 18, 2016 at 12:48 PM, Doug Cutting <cutting@gmail.com>
> >>> wrote:
> >>> >
> >>> >> On Sun, Apr 17, 2016 at 2:00 PM, Matthieu Monsch <
> monsch@alum.mit.edu>
> >>> >> wrote:
> >>> >> > + For unions, we will add an optional catch-all attribute
to mark
> a
> >>> >> branch as resolution target when no names or aliases match (and
> come up
> >>> >> with the corresponding syntax).
> >>> >>
> >>> >> Can this be compatible?  If you add a new union syntax (e.g.,
> >>> >> {"type":"union", "branches":[...], "default":...}) then existing
> >>> >> implementations will not be able to read new data that uses this
> >>> >> feature.
> >>> >>
> >>> >> Doug
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Ryan Blue
> >>> > Software Engineer
> >>> > Netflix
> >>>
> >>
> >>
> >>
> >> --
> >> Ryan Blue
> >> Software Engineer
> >> Netflix
> >>
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>



-- 
Ryan Blue
Software Engineer
Netflix

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message