avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-656) writing unions with multiple records, fixed or enums can choose wrong branch
Date Wed, 05 Jan 2011 17:24:47 GMT

    [ https://issues.apache.org/jira/browse/AVRO-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977859#action_12977859

Doug Cutting commented on AVRO-656:

> this patch would be a major backwards-incompatible change to the spec. In our code, we
we're using the ["null", "fixed4", "fixed16"] case all the time to represent IPv4 or IPv6

That would nix the patch, then, since we don't want to introduce such an incompatibility.
If C does correctly implement unions as specified then I was mistaken to assert above that
no language did.

So instead perhaps I should fix Java to correctly implement unions as currently specified:
 - fixing union dispatch among records to consider the namespace (easy, should be compatible,
already in this patch)
 - adding a getSchema() method to GenericEnumSymbol and GenericFixed so that we can check
the name (incompatible API change, adding a Schema method to the constructors for these)

Unless there are objections, I'll try this approach.

> writing unions with multiple records, fixed or enums can choose wrong branch 
> -----------------------------------------------------------------------------
>                 Key: AVRO-656
>                 URL: https://issues.apache.org/jira/browse/AVRO-656
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.5.0
>         Attachments: AVRO-656.patch, AVRO-656.patch
> According to the specification, a union may contain multiple instances of a named type,
provided they have different names.  There are several bugs in the Java implementation of
this when writing data:
>  - for record, only the short-name of the record is checked, so the branch for a record
of the same name in a different namespace may be used by mistake
>  - for enum and fixed, the name of the record is not checked, so the first enum or fixed
in the union will always be assumed when writing.  in many cases this may cause the wrong
data to be written, potentially corrupting output.
> This is not a regression.  This has never been implemented correctly by Java.  Python
and Ruby never check names, but rather perform a full, recursive validation of content.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message