Return-Path: X-Original-To: apmail-avro-dev-archive@www.apache.org Delivered-To: apmail-avro-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BE4E410D0C for ; Wed, 3 Jul 2013 23:24:20 +0000 (UTC) Received: (qmail 68414 invoked by uid 500); 3 Jul 2013 23:24:20 -0000 Delivered-To: apmail-avro-dev-archive@avro.apache.org Received: (qmail 68359 invoked by uid 500); 3 Jul 2013 23:24:20 -0000 Mailing-List: contact dev-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@avro.apache.org Delivered-To: mailing list dev@avro.apache.org Received: (qmail 68350 invoked by uid 99); 3 Jul 2013 23:24:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jul 2013 23:24:20 +0000 Date: Wed, 3 Jul 2013 23:24:20 +0000 (UTC) From: "Martin Kleppmann (JIRA)" To: dev@avro.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AVRO-1347) Improve name and alias matching for named schemas MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AVRO-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699588#comment-13699588 ] Martin Kleppmann commented on AVRO-1347: ---------------------------------------- I can imagine a scenario where this may be useful. Say you have a key-value store; each value is a pair of (schema-id, avro-data) where schema-id identifies the writer's schema, and avro-data is encoded using that schema. Various different clients read values from the store and write values to the store. Some clients may be using a newer version of the schema than others (e.g. the client may use specific classes generated from the version of the schema that was current at the time the client was built). In this scenario, when evolving the schema, fields or types cannot safely be renamed, because clients built using the old schema cannot read values written by clients with a newer schema. Using alias information in the writer schema would allow such clients to map the names in the newer schema into their own, older schema, without rebuilding clients. That would be an advantage of this change. However, there are also other things you cannot safely do in this scenario. For example, you cannot add a branch to a union, or a symbol to an enum, because again clients built using an older version of the schema would not be able to map occurrences of that new union branch or enum symbol into their own old-schema world. This suggests that this scenario is a use case that Avro doesn't really support anyway (though perhaps we should give some guidance to people who have a use case like this). Which means that this scenario doesn't really support changing alias semantics. Perhaps there are other scenarios that would support it more strongly, but so far I don't think we have strong enough arguments for changing semantics. In general, I would suggest being very cautious about any spec changes. A lot of the appeal of a serialization format like Avro is that it takes compatibility very seriously, which makes it suitable for long-lived systems. A data serialization format is not a place to be adventurous. > Improve name and alias matching for named schemas > ------------------------------------------------- > > Key: AVRO-1347 > URL: https://issues.apache.org/jira/browse/AVRO-1347 > Project: Avro > Issue Type: Improvement > Components: java > Reporter: Vincenz Priesnitz > Attachments: AVRO-1347.patch, AVRO-1347.patch > > > When reading an avro file with a named schema, the aliases of the writers schema are not taken into account; only the aliases of the readers are matched against the writers name. Even if the writers aliases match the readers name, the schemas will not be matched. > For example, the following two enum schemas will not be matched, even though they share a common alias. > {code} > { > "type" : "enum", > "name" : "foo", > "alias" : "CommonAlias", > "symbols" : ["LEFT", "RIGHT"] > } > {code} > {code} > { > "type" : "enum", > "name" : "bar", > "alias" : "CommonAlias", > "symbols" : ["LEFT", "RIGHT"] > } > {code} > In most cases, the DatumReader resolves records of different names or namespaces by matching their fields. > Unfortunately, there are some cases, where this sort of matching is not happening, but just the names are compared: > * Other named nodes, like enums, fixed or fieldschemas are not matched this way. > * A record inside a union is also only matched by the full name. > The latter one is especially tricky, since two recordschemas that match structurally but differ in name or space, are interexchangable until they are put into an union, at which point an exception is thrown. > I propose that two named schemas are matched, when they share a common name or alias. > I implemented said changes and added a java annotation @AvroAlias(alias, space) that allows one to add an alias to a record, enum or field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira