avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremy Kahn (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1304) Python Avro match_schemas called redundantly
Date Mon, 22 Apr 2013 17:33:16 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638198#comment-13638198

Jeremy Kahn commented on AVRO-1304:

Uri, what strategy are you using to try to fix this? Could we memoize the partner schema to
short-circuit out of match_schemas (trading a small amount of memory for speed)?

I'm eager to improve the speed of the Python library, and a 20% speedup could shave days off
my team's product delivery.

Contact me offline (jeremy@trochee.net) if you'd like to share your profiling setup (I can
try to implement related speedups).  
> Python Avro match_schemas called redundantly
> --------------------------------------------
>                 Key: AVRO-1304
>                 URL: https://issues.apache.org/jira/browse/AVRO-1304
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.7.4
>            Reporter: Uri Laserson
> DatumReader.match_schemas(writers_schema, readers_schema) is called on every single read
from the DatumReader.  However, for almost every read, the schemas used are the object members
self.writers_schema and self.readers_schema.  match_schemas should be checked only once in
this case, and only when the object members are modified.  This takes up 20% of my parse time
upon profiling.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message