Return-Path: Delivered-To: apmail-avro-dev-archive@www.apache.org Received: (qmail 97525 invoked from network); 6 Jan 2011 23:33:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Jan 2011 23:33:11 -0000 Received: (qmail 60701 invoked by uid 500); 6 Jan 2011 23:33:11 -0000 Delivered-To: apmail-avro-dev-archive@avro.apache.org Received: (qmail 60654 invoked by uid 500); 6 Jan 2011 23:33:11 -0000 Mailing-List: contact dev-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@avro.apache.org Delivered-To: mailing list dev@avro.apache.org Received: (qmail 60646 invoked by uid 99); 6 Jan 2011 23:33:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jan 2011 23:33:11 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jan 2011 23:33:09 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id p06NWl3S001414 for ; Thu, 6 Jan 2011 23:32:47 GMT Message-ID: <31764389.201641294356767404.JavaMail.jira@thor> Date: Thu, 6 Jan 2011 18:32:47 -0500 (EST) From: "Doug Cutting (JIRA)" To: dev@avro.apache.org Subject: [jira] Updated: (AVRO-656) writing unions with multiple records, fixed or enums can choose wrong branch MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/AVRO-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cutting updated AVRO-656: ------------------------------ Attachment: AVRO-656.patch Here's a patch intended to make Java correctly implement this aspect of the specification. In particular, records, enums and fixed in unions should be resolved in unions on the basis of their full names, including namespace. I've attempted to do this back-compatibly, so that Java applications which don't use the new GenericData.EnumSymbol and GenericDataFixed constructors, and whose unions only contain a single enum or fixed will not be affected. The old constructors are deprecated. This fails two tests, one expected, one not. The expected failure is the compiler fidelity, since the generated code is changed. The unexpected failure is in TestSchema#testComplexUnions, where ResolvingDecoder throws an exception. {code} org.apache.avro.AvroTypeException: Found { "type" : "fixed", "name" : "Bar2", "size" : 1 }, expecting { "type" : "fixed", "name" : "Bar", "size" : 1 } at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:225) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ValidatingDecoder.checkFixed(ValidatingDecoder.java:121) at org.apache.avro.io.ValidatingDecoder.readFixed(ValidatingDecoder.java:132) at org.apache.avro.generic.GenericDatumReader.readFixed(GenericDatumReader.java:234) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:126) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:125) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:112) at org.apache.avro.TestSchema.checkBinary(TestSchema.java:592) at org.apache.avro.TestSchema.check(TestSchema.java:556) at org.apache.avro.TestSchema.testComplexUnions(TestSchema.java:336) {code} Thiru, do you have any idea what causes this? > writing unions with multiple records, fixed or enums can choose wrong branch > ----------------------------------------------------------------------------- > > Key: AVRO-656 > URL: https://issues.apache.org/jira/browse/AVRO-656 > Project: Avro > Issue Type: Bug > Components: java > Affects Versions: 1.4.0 > Reporter: Doug Cutting > Assignee: Doug Cutting > Fix For: 1.5.0 > > Attachments: AVRO-656.patch, AVRO-656.patch, AVRO-656.patch > > > According to the specification, a union may contain multiple instances of a named type, provided they have different names. There are several bugs in the Java implementation of this when writing data: > - for record, only the short-name of the record is checked, so the branch for a record of the same name in a different namespace may be used by mistake > - for enum and fixed, the name of the record is not checked, so the first enum or fixed in the union will always be assumed when writing. in many cases this may cause the wrong data to be written, potentially corrupting output. > This is not a regression. This has never been implemented correctly by Java. Python and Ruby never check names, but rather perform a full, recursive validation of content. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.