avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Douglas Kaminsky (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-973) Union behavior not consistent
Date Fri, 10 Feb 2012 19:21:00 GMT

    [ https://issues.apache.org/jira/browse/AVRO-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205664#comment-13205664
] 

Douglas Kaminsky commented on AVRO-973:
---------------------------------------

And to expound on my previous comment:

There is no change you can make to the current validation-based mechanic that guarantees correctness
for record types - for example, consider that you could have complex numeric types that are
similar in structure but distinct in meaning:

Amount { "mantissa" : "string", "exp" : "string" }
Money { "mantissa" : "string", "exp" : "string", "currency" : {"type" : "string", "default"
: "USD"} }

This is a trivial example, but believe me when I tell you that we have 209 types in our schema
and several build on each other.

I contend that to be correct, the implementation should work correctly regardless of union
order, ie. Serializing against ["null", "Amount", "Money"] should yield the same result as
["null", "Money", "Amount"]

Now suppose I serialize datum:

{ "mantissa" : "314159", "exp" : "-5" }

* If you validate without the break, this will serialize as "Money" against ["null", "Amount",
"Money"] but "Amount" against ["null", "Money", "Amount"]
* With the break, this will serialize as "Amount" against ["null", "Amount", "Money"] but
"Money" against ["null", "Money", "Amount"]

Either way, the intention of the message sender is lost.
                
> Union behavior not consistent
> -----------------------------
>
>                 Key: AVRO-973
>                 URL: https://issues.apache.org/jira/browse/AVRO-973
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.6.1, 1.6.2
>            Reporter: Gaurav Nanda
>              Labels: patch
>         Attachments: AVRO-973-patch-1.patch, AVRO-973-patch-2.patch, AVRO-973-patch-3.patch,
AVRO-973-wrapper.patch, AVRO-973-wrapper.patch, test_unions.py
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> Python's union does not respect the order in which type is specified.
> For following schema: {"type":"map","values":["int","long","float","double","string","boolean"]},
an integer value is written as double, but it should respect the order in which types have
been specified.
> Fixed Code (io.py):
> def write_union(self, writers_schema, datum, encoder):
>    """
>    A union is encoded by first writing a long value indicating
>    the zero-based position within the union of the schema of its value.
>    The value is then encoded per the indicated schema within the union.
>    """
>    # resolve union
>    index_of_schema = -1
>    for i, candidate_schema in enumerate(writers_schema.schemas):
>      if validate(candidate_schema, datum):
>        index_of_schema = i
>        break // XXX Add break statement here XXX//
>    if index_of_schema < 0: raise AvroTypeException(writers_schema, datum)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message