avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neil Ferguson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-1968) Python DatumWriter seems to evaluate union types in reverse order
Date Fri, 09 Dec 2016 12:51:58 GMT

     [ https://issues.apache.org/jira/browse/AVRO-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Neil Ferguson updated AVRO-1968:
--------------------------------
    Description: 
The Python DatumWriter seems to evaluate types in a union in reverse order. For example, with
the following schema:

{noformat}
        {
            "type": "record",
            "name": "MyRecord",
            "fields": [
                {"name": "my_field", "type": ["boolean", "double"]}
            ]
        }
{noformat}

If I set my_field to a boolean in my data, it seems to be encoded as a double. However, if
I reverse the order of the types in my union ({{["double", "boolean"]}}) it seems to be encoded
as a boolean.

This seems unintuitive for a couple of reasons:

 * I'd expect the types in the union to be evaluated in the order they are specified, but
they seem to be evaluated in reverse order
 * Encoding a boolean as a double is a bit weird

I'm not sure if this is a bug or expected behaviour though. If this is the expected behaviour
(or it can't be changed without breaking things) then it would be nice if this was documented
somewhere (I searched by couldn't find anything), as it's pretty unintuitive.

I've attached a full test case. The test case encodes and then decodes the data with both
the original schema and the reversed version. For me it prints:

{noformat}
Type: <type 'float'>
Type from reversed schema: <type 'bool'>
{noformat}

Ideally I'd expect the type to be 'bool' both times, but failing that I'd expect the type
to be 'bool' the first time, and 'float' the second time. 

  was:
The Python DatumWriter seems to evaluate types in a union in reverse order. For example, with
the following schema:

{noformat}
        {
            "type": "record",
            "name": "MyRecord",
            "fields": [
                {"name": "my_field", "type": ["boolean", "double"]}
            ]
        }
{noformat}

If I set my_field to a boolean in my data, it seems to be encoded as a double. However, if
I reverse the order of the types in my union ({{["double", "boolean"]}}) it seems to be encoded
as a boolean.

This seems unintuitive for a couple of reasons:

 * I'd expect the types in the union to be evaluated in the order they are specified
 * Encoding a boolean as a double is a bit weird

I'm not sure if this is a bug or expected behaviour though. If this is the expected behaviour
(or it can't be changed without breaking things) then it would be nice if this was documented
somewhere (I searched by couldn't find anything), as it's pretty unintuitive.

I've attached a full test case. The test case encodes and then decodes the data with both
the original schema and the reversed version. For me it prints:

{noformat}
Type: <type 'float'>
Type from reversed schema: <type 'bool'>
{noformat}

Ideally I'd expect the type to be 'bool' both times, but failing that I'd expect the type
to be 'bool' the first time, and 'float' the second time. 


> Python DatumWriter seems to evaluate union types in reverse order 
> ------------------------------------------------------------------
>
>                 Key: AVRO-1968
>                 URL: https://issues.apache.org/jira/browse/AVRO-1968
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.8.1
>            Reporter: Neil Ferguson
>         Attachments: avro_test.py
>
>
> The Python DatumWriter seems to evaluate types in a union in reverse order. For example,
with the following schema:
> {noformat}
>         {
>             "type": "record",
>             "name": "MyRecord",
>             "fields": [
>                 {"name": "my_field", "type": ["boolean", "double"]}
>             ]
>         }
> {noformat}
> If I set my_field to a boolean in my data, it seems to be encoded as a double. However,
if I reverse the order of the types in my union ({{["double", "boolean"]}}) it seems to be
encoded as a boolean.
> This seems unintuitive for a couple of reasons:
>  * I'd expect the types in the union to be evaluated in the order they are specified,
but they seem to be evaluated in reverse order
>  * Encoding a boolean as a double is a bit weird
> I'm not sure if this is a bug or expected behaviour though. If this is the expected behaviour
(or it can't be changed without breaking things) then it would be nice if this was documented
somewhere (I searched by couldn't find anything), as it's pretty unintuitive.
> I've attached a full test case. The test case encodes and then decodes the data with
both the original schema and the reversed version. For me it prints:
> {noformat}
> Type: <type 'float'>
> Type from reversed schema: <type 'bool'>
> {noformat}
> Ideally I'd expect the type to be 'bool' both times, but failing that I'd expect the
type to be 'bool' the first time, and 'float' the second time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message