avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manvendra Singh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-2046) avro-python3: Very restricted set of data types which are allowed in AvroSchemaFromJSONData
Date Thu, 06 Jul 2017 18:31:02 GMT

     [ https://issues.apache.org/jira/browse/AVRO-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Manvendra Singh updated AVRO-2046:
----------------------------------
    Description: 
Hey, I come from [CWL project](https://github.com/common-workflow-language/cwltool) and as
a part of my GSoC project, I'm working on adding Python 3 compatibility to ``cwltool`` codebase.
We've been using avro-python2 for a long time now and it has worked great for us in our projects:
schema_salad and cwltool.

In the process of porting cwltool, I'm facing issues with avro-python3 library. I've found
the following bug:

Minimal reproducable example:

{code:none}
from collections import OrderedDict
import avro.schema
AvroSchemaFromJSONData = avro.schema.SchemaFromJSONData

a = {
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "favorite_number",
      "type": [
        "int",
        "null"
      ]
    },
    {
      "name": "favorite_color",
      "type": [
        "string",
        "null"
      ]
    }
  ],
  "name": "User",
  "namespace": "example.avro",
  "type": "record"
}

b = OrderedDict(a)

AvroSchemaFromJSONData(a)
AvroSchemaFromJSONData(b)

{code}
Ouput: 

{code}
~/Desktop/test/venv3/lib/python3.5/site-packages/avro/schema.py in SchemaFromJSONData(json_data,
names)
   1252   if parser is None:
   1253     raise SchemaParseException(
-> 1254         'Invalid JSON descriptor for an Avro schema: %r.' % json_data)
   1255   return parser(json_data, names=names)
   1256 

SchemaParseException: Invalid JSON descriptor for an Avro schema: OrderedDict([('namespace',
'example.avro'), ('type', 'record'), ('name', 'User'), ('fields', [{'type': 'string', 'name':
'name'}, {'type': ['int', 'null'], 'name': 'favorite_number'}, {'type': ['string', 'null'],
'name': 'favorite_color'}])]).
{code}

 
h5. The current implementation of this function does not allow for *any dict like data type*.
It, however, works in avro-python2. 

Relevant line of code: https://github.com/apache/avro/blob/master/lang/py3/avro/schema.py#L1250

Apart from this, I've tried using ``2to3`` tool on avro-python2 and testing our project with
it and it works perfectly. Thus, through this issue I also want to motivate the following
PR: https://github.com/apache/avro/pull/234
I don't expect a unified codebase for avro python2 and python3 as of now or in near future.
There has been a discussion on it before: https://github.com/apache/avro/pull/133

But having avro-python2 cross compatible for both py2 and py3 would be really helpful for
our project and we will be able to complete our porting process. Thanks.  

  was:
Hey, I come from [CWL project](https://github.com/common-workflow-language/cwltool) and as
a part of my GSoC project, I'm working on adding Python 3 compatibility to ``cwltool`` codebase.
We've been using avro-python2 for a long time now and it has worked great for us in our projects:
schema_salad and cwltool.

In the process of porting cwltool, I'm facing issues with avro-python3 library. This is one
of the bug I've found in the process. 

Minimal reproducable example:


{code:none}
from collections import OrderedDict
import avro.schema
AvroSchemaFromJSONData = avro.schema.SchemaFromJSONData

a = {
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "favorite_number",
      "type": [
        "int",
        "null"
      ]
    },
    {
      "name": "favorite_color",
      "type": [
        "string",
        "null"
      ]
    }
  ],
  "name": "User",
  "namespace": "example.avro",
  "type": "record"
}

b = OrderedDict(a)

AvroSchemaFromJSONData(a)
AvroSchemaFromJSONData(b)

{code}
Ouput: 

{code}
~/Desktop/test/venv3/lib/python3.5/site-packages/avro/schema.py in SchemaFromJSONData(json_data,
names)
   1252   if parser is None:
   1253     raise SchemaParseException(
-> 1254         'Invalid JSON descriptor for an Avro schema: %r.' % json_data)
   1255   return parser(json_data, names=names)
   1256 

SchemaParseException: Invalid JSON descriptor for an Avro schema: OrderedDict([('namespace',
'example.avro'), ('type', 'record'), ('name', 'User'), ('fields', [{'type': 'string', 'name':
'name'}, {'type': ['int', 'null'], 'name': 'favorite_number'}, {'type': ['string', 'null'],
'name': 'favorite_color'}])]).
{code}

 
h5. Current implementation of this function does not allow for *any dict like data type*.
It however works in avro-python2. 

Relevant line of code: https://github.com/apache/avro/blob/master/lang/py3/avro/schema.py#L1250


> avro-python3: Very restricted set of data types which are allowed in AvroSchemaFromJSONData
> -------------------------------------------------------------------------------------------
>
>                 Key: AVRO-2046
>                 URL: https://issues.apache.org/jira/browse/AVRO-2046
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.8.2
>         Environment: avro-python3 (1.8.2)
>            Reporter: Manvendra Singh
>
> Hey, I come from [CWL project](https://github.com/common-workflow-language/cwltool) and
as a part of my GSoC project, I'm working on adding Python 3 compatibility to ``cwltool``
codebase. We've been using avro-python2 for a long time now and it has worked great for us
in our projects: schema_salad and cwltool.
> In the process of porting cwltool, I'm facing issues with avro-python3 library. I've
found the following bug:
> Minimal reproducable example:
> {code:none}
> from collections import OrderedDict
> import avro.schema
> AvroSchemaFromJSONData = avro.schema.SchemaFromJSONData
> a = {
>   "fields": [
>     {
>       "name": "name",
>       "type": "string"
>     },
>     {
>       "name": "favorite_number",
>       "type": [
>         "int",
>         "null"
>       ]
>     },
>     {
>       "name": "favorite_color",
>       "type": [
>         "string",
>         "null"
>       ]
>     }
>   ],
>   "name": "User",
>   "namespace": "example.avro",
>   "type": "record"
> }
> b = OrderedDict(a)
> AvroSchemaFromJSONData(a)
> AvroSchemaFromJSONData(b)
> {code}
> Ouput: 
> {code}
> ~/Desktop/test/venv3/lib/python3.5/site-packages/avro/schema.py in SchemaFromJSONData(json_data,
names)
>    1252   if parser is None:
>    1253     raise SchemaParseException(
> -> 1254         'Invalid JSON descriptor for an Avro schema: %r.' % json_data)
>    1255   return parser(json_data, names=names)
>    1256 
> SchemaParseException: Invalid JSON descriptor for an Avro schema: OrderedDict([('namespace',
'example.avro'), ('type', 'record'), ('name', 'User'), ('fields', [{'type': 'string', 'name':
'name'}, {'type': ['int', 'null'], 'name': 'favorite_number'}, {'type': ['string', 'null'],
'name': 'favorite_color'}])]).
> {code}
>  
> h5. The current implementation of this function does not allow for *any dict like data
type*. It, however, works in avro-python2. 
> Relevant line of code: https://github.com/apache/avro/blob/master/lang/py3/avro/schema.py#L1250
> Apart from this, I've tried using ``2to3`` tool on avro-python2 and testing our project
with it and it works perfectly. Thus, through this issue I also want to motivate the following
PR: https://github.com/apache/avro/pull/234
> I don't expect a unified codebase for avro python2 and python3 as of now or in near future.
There has been a discussion on it before: https://github.com/apache/avro/pull/133
> But having avro-python2 cross compatible for both py2 and py3 would be really helpful
for our project and we will be able to complete our porting process. Thanks.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message