avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David McIntosh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-1332) Improve C# DatumReader performance
Date Wed, 11 Sep 2013 08:00:55 GMT

     [ https://issues.apache.org/jira/browse/AVRO-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

David McIntosh updated AVRO-1332:

    Attachment: AVRO-1332-3.patch

Yes, the readers/writers can be cached and reused. They should be thread-safe as well. I think
it might be best for the users to manage that themselves if performance is a concern in their

It looks like the Complex example ended up slower because the benefit of pre-resolving was
small and there was extra overhead when processing the results of the pre-resolution.  The
Complex schema had a lot of unions and arrays of basic types which won't see much speedup.
I was able to make a few tweaks to shrink the time gap though. I also discovered one of the
unit tests was failing and the fix slowed down the new specific writer slightly.

Here are the new results for batch size 1000. I also included two other types. Narrow is a
schema with 3 primitive fields. Wide has 35 fields of mostly primitives and a few child records.

|type|old specific|new specific|old generic|new generic|

|type|old specific|new specific|old generic|new generic|
> Improve C# DatumReader performance
> ----------------------------------
>                 Key: AVRO-1332
>                 URL: https://issues.apache.org/jira/browse/AVRO-1332
>             Project: Avro
>          Issue Type: Improvement
>          Components: csharp
>    Affects Versions: 1.7.5
>            Reporter: David McIntosh
>            Priority: Minor
>              Labels: performance
>         Attachments: AVRO-1332-2.patch, AVRO-1332-3.patch, AVRO-1332.patch
> The current implementations of the C# datum readers perform resolution of the reader
and writer schema on every call to Read. In my tests this was causing it to perform poorly
when reading a large number of records (slower than parsing the same data from delimited text
files). It would be more efficient if the reader only needed to resolve the schemas once.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message