avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Postelnik (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1347) Improve name and alias matching for named schemas
Date Tue, 14 Jan 2014 15:40:52 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870822#comment-13870822
] 

Igor Postelnik commented on AVRO-1347:
--------------------------------------

We have this problem as well with forward compatibility of data stored in hadoop. When a job
that produces a dataset is updated to rename a field, the client jobs that consume this dataset
have to be recompiled. This is an important requirement for supporting schema evolution.

> Improve name and alias matching for named schemas
> -------------------------------------------------
>
>                 Key: AVRO-1347
>                 URL: https://issues.apache.org/jira/browse/AVRO-1347
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Vincenz Priesnitz
>         Attachments: AVRO-1347.patch, AVRO-1347.patch
>
>
> When reading an avro file with a named schema, the aliases of the writers schema are
not taken into account; only the aliases of the readers are matched against the writers name.
Even if the writers aliases match the readers name, the schemas will not be matched.
> For example, the following two enum schemas will not be matched, even though they share
a common alias. 
> {code}
> {
> 	"type"  : "enum",
> 	"name"  : "foo",
> 	"alias" : "CommonAlias",
> 	"symbols" : ["LEFT", "RIGHT"]
> }
> {code}
> {code}
> {
> 	"type"  : "enum",
> 	"name"  : "bar",
> 	"alias" : "CommonAlias",
> 	"symbols" : ["LEFT", "RIGHT"]
> }
> {code}
> In most cases, the DatumReader resolves records of different names or namespaces by matching
their fields. 
> Unfortunately, there are some cases, where this sort of matching is not happening, but
just the names are compared:
> * Other named nodes, like enums, fixed or fieldschemas are not matched this way. 
> * A record inside a union is also only matched by the full name. 
> The latter one is especially tricky, since two recordschemas that match structurally
but differ in name or space, are interexchangable until they are put into an union, at which
point an exception is thrown.
> I propose that two named schemas are matched, when they share a common name or alias.

> I implemented said changes and added a java annotation @AvroAlias(alias, space) that
allows one to add an alias to a record, enum or field.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message