avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David McIntosh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-1332) Improve C# DatumReader performance
Date Wed, 11 Sep 2013 08:00:55 GMT

     [ https://issues.apache.org/jira/browse/AVRO-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David McIntosh updated AVRO-1332:
---------------------------------

    Attachment: AVRO-1332-3.patch

Yes, the readers/writers can be cached and reused. They should be thread-safe as well. I think
it might be best for the users to manage that themselves if performance is a concern in their
app.

It looks like the Complex example ended up slower because the benefit of pre-resolving was
small and there was extra overhead when processing the results of the pre-resolution.  The
Complex schema had a lot of unions and arrays of basic types which won't see much speedup.
I was able to make a few tweaks to shrink the time gap though. I also discovered one of the
unit tests was failing and the fix slowed down the new specific writer slightly.

Here are the new results for batch size 1000. I also included two other types. Narrow is a
schema with 3 primitive fields. Wide has 35 fields of mostly primitives and a few child records.

Serializing
|type|old specific|new specific|old generic|new generic|
|simple|1950|1513|2496|1904|
|complex|14696|16380|13806|14945|
|narrow|1030|796|1217|952|
|wide|16599|14586|13167|10655|

Deserializing
|type|old specific|new specific|old generic|new generic|
|simple|4321|905|5647|1669|
|complex|28158|13541|25631|14071|
|narrow|2355|515|2854|764|
|wide|25116|5319|30295|10093|
                
> Improve C# DatumReader performance
> ----------------------------------
>
>                 Key: AVRO-1332
>                 URL: https://issues.apache.org/jira/browse/AVRO-1332
>             Project: Avro
>          Issue Type: Improvement
>          Components: csharp
>    Affects Versions: 1.7.5
>            Reporter: David McIntosh
>            Priority: Minor
>              Labels: performance
>         Attachments: AVRO-1332-2.patch, AVRO-1332-3.patch, AVRO-1332.patch
>
>
> The current implementations of the C# datum readers perform resolution of the reader
and writer schema on every call to Read. In my tests this was causing it to perform poorly
when reading a large number of records (slower than parsing the same data from delimited text
files). It would be more efficient if the reader only needed to resolve the schemas once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message