avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Leigh L. Klotz, Jr (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1345) Python Codegen
Date Thu, 13 Jun 2013 00:07:20 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681784#comment-13681784
] 

Leigh L. Klotz, Jr commented on AVRO-1345:
------------------------------------------

Codegen lets you prevent the creation of data messages that don't correspond to the schema.
 Using a Python dict turns that error into a runtime error.  This gives you the ability to
detect data validation errors earlier.

                
> Python Codegen
> --------------
>
>                 Key: AVRO-1345
>                 URL: https://issues.apache.org/jira/browse/AVRO-1345
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, python
>            Reporter: Tal Levy
>         Attachments: AVRO-1345.patch
>
>
> I recently started using Avro at my work and we found it difficult to keep 
> track of what python dict matched to what schema. Instead of having 
> random dicts being populated and then attempted to be serialized to avro, I thought 
> it would be more readable and less error prone to codegen the python dict 
> for developers. These classes are type checked field by field. Although it does not 
> have the advantage of compiled type checking like in the java codegen, it is a 
> friendly wrapper around python dicts representing avro records to be serialized.
> let me know what you think about this, I am still tweaking how it behaves. 
> I understand it is a bit unpythonic to enforce types in this way, but the readability

> is worth it nonetheless.
> here is an example record:
> https://gist.github.com/talevy/5696236
> I extended the avro compiler/tools to provide both java and python codegen functionality.
> so if this sounds like something others would use, maybe it makes sense to include it
> into the main repo.
> here are the changes
> https://github.com/talevy/avro/tree/python-codegen
> a few caveats and thoughts about my current version:
> 1. I do not know how to best handle constructors, because some fields are not allowed
to be null... maybe a builder pattern would work here, but it's kind of weird in python
> 2. I copy/pasted a lot of the code from SpecificCompiler to make the PythonCompiler...
some renaming and code re-use via inheritance would make it read better.
> 3. I wanted to reuse the validate methods provided already in Avro to verify the record,
but it takes away from some of the class type correctness for nested records and such.
> 4. I do not know what the best way of outputing multiple files is, I currently use the
same packaging as the java classes into their namespace directories
> 5. I am not familiar with the avro-protocol format, so I only implemented enums and records.
> I updated the SpecificCompilerTool to have the following usage
> ```
> "Usage: [-string] (schema|protocol) (python|java) input... outputdir"
> ````
> So generating the python classes is as easy as java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message