avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1347) Improve name and alias matching for named schemas
Date Wed, 03 Jul 2013 17:52:21 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699247#comment-13699247

Doug Cutting commented on AVRO-1347:

This is a fundamental change to alias semantics.  If we implement it, we'll need to add it
to the specification too.

I'm not (yet) convinced this change is required.  The writer's schema is currently used minimally--one
can generally replace writer schemas with their Parsing Canonical Form and everything would
work the same.  The reader's schema is used to interpret the data, containing aliases and
other annotations that influence the representation of that data.  In the above case, is there
a reason that "foo" cannot be added to the aliases of "bar"?  The reader's schema is assumed
to be malleable and the writer's is not.

Looking at the patch (should we choose to implement this), one potential bug is that one cannot
specify an alias that has no namespace.  Probably null should be interpreted as the containing
namespace and the empty string as no namespace.

Perhaps the annotation should be moved to AVRO-1341, independent of the change in alias semantics?

Also, the patch needs unit tests.
> Improve name and alias matching for named schemas
> -------------------------------------------------
>                 Key: AVRO-1347
>                 URL: https://issues.apache.org/jira/browse/AVRO-1347
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Vincenz Priesnitz
>         Attachments: AVRO-1347.patch, AVRO-1347.patch
> When reading an avro file with a named schema, the aliases of the writers schema are
not taken into account; only the aliases of the readers are matched against the writers name.
Even if the writers aliases match the readers name, the schemas will not be matched.
> For example, the following two enum schemas will not be matched, even though they share
a common alias. 
> {code}
> {
> 	"type"  : "enum",
> 	"name"  : "foo",
> 	"alias" : "CommonAlias",
> 	"symbols" : ["LEFT", "RIGHT"]
> }
> {code}
> {code}
> {
> 	"type"  : "enum",
> 	"name"  : "bar",
> 	"alias" : "CommonAlias",
> 	"symbols" : ["LEFT", "RIGHT"]
> }
> {code}
> In most cases, the DatumReader resolves records of different names or namespaces by matching
their fields. 
> Unfortunately, there are some cases, where this sort of matching is not happening, but
just the names are compared:
> * Other named nodes, like enums, fixed or fieldschemas are not matched this way. 
> * A record inside a union is also only matched by the full name. 
> The latter one is especially tricky, since two recordschemas that match structurally
but differ in name or space, are interexchangable until they are put into an union, at which
point an exception is thrown.
> I propose that two named schemas are matched, when they share a common name or alias.

> I implemented said changes and added a java annotation @AvroAlias(alias, space) that
allows one to add an alias to a record, enum or field.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message