avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Zeyliger (JIRA)" <j...@apache.org>
Subject [jira] Updated: (AVRO-620) Python implementation doesn't stringify sub-schemas correctly
Date Sat, 21 Aug 2010 18:58:16 GMT

     [ https://issues.apache.org/jira/browse/AVRO-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Philip Zeyliger updated AVRO-620:
---------------------------------

    Attachment: AVRO-620.patch.txt

I believe I've fixed this.  I implemented a Schema.to_json(names) method, which recursively
serializes schema objects to JSON-compatible structures, avoiding re-serializing schemas which
we've already seen.  (This also means avoiding serializing JSON just to deserialize it again.)
 I was able to get rid of some variables which tracked how the schema was originally defined,
because this recursion is taking care of noticing that.

As I needed to, I removed some verbosity from the tests and removed some exception handling.
 It's very unhelpful when python tests catch exceptions, because they make it that much harder
to track down the exact point of the failure.  (An exception that propagates through a test
is a test failure, so there's no need to separately mark the test as failed.)  Printing extra
information about what tests are running distracts from where the failures are occurring.
 I recommend the nose test runner (with flags --pdb --pdb-failure) for running the tests.

I've added a test that triggered this in the first place.

> Python implementation doesn't stringify sub-schemas correctly
> -------------------------------------------------------------
>
>                 Key: AVRO-620
>                 URL: https://issues.apache.org/jira/browse/AVRO-620
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>            Reporter: Philip Zeyliger
>         Attachments: AVRO-620.patch.txt
>
>
> {noformat}
> In [9]: import avro.schema
> In [10]: s = avro.schema.parse('{"type": "record", "name": "X", "fields": [{"name": "y",
"type": {"type": "record", "name": "Y", "fields": [{"name": "Z", "type": "X"}]}}]}')
> In [11]: str(s.fields[0].type)
> Out[11]: '{"fields": [{"type": "X", "name": "Z"}], "type": "record", "name": "Y"}'
> {noformat}
> str(schema) is used in avro data files to record the schema.  In the case above, when
we serialize the schema for Y, we should actually also serialize the schema for X, since Y
needs the schema for X.
> I ran smack into this when using a schema from a protocol to write a data file, and finding
that a lot of the types weren't defined when looking at the avro data file generated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message