avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Oliver (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-557) Speed up one-time data decoding
Date Wed, 02 Jun 2010 20:20:29 GMT

    [ https://issues.apache.org/jira/browse/AVRO-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874777#action_12874777

Kevin Oliver commented on AVRO-557:

We do a decent amount of 1 time usage of BinaryDecoders and GenericDatumReaders. When we upgraded
to Avro 1.3 we saw significant regression in performance on decoding. A profiler showed the
issue pretty quickly.

Basically, it boiled down to 2 issue:
1) Having GenericDatumReaders always create the ResolvingDecoder is too expensive for one
time usage.
2) BinaryDecoders now created a bunch of arrays and got more complicated, again significantly
slowing down one time usage.

I'm attaching a patch that has a somewhat hacky workaround. I've resurrected the BinaryDecoder
code from v1.2 (more or less). I've also created a GenericDatumReaderWithOptionalResolver
class that basically forks GenericDatumReader to allow for reading directly from the supplied

Running the newly added 'Perf -GoneTimeUse' you can see the stark difference:
GenericReaderOneTimeUsage12Test: 2175 ms, 1.9147720770945649 million entries/sec.  0.008961491780473783
million bytes/sec
GenericReaderOneTimeUsage13Test: 13152 ms, 0.3167766318368232 million entries/sec.  0.0014825739399539307
million bytes/sec

I don't believe we should commit the patch as is. But I'd like some feedback on how to go
from here to get this performance back.

> Speed up one-time data decoding
> -------------------------------
>                 Key: AVRO-557
>                 URL: https://issues.apache.org/jira/browse/AVRO-557
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.3.2
>            Reporter: Kevin Oliver
>            Assignee: Kevin Oliver
>             Fix For: 1.4.0
> There are big gains to be had in performance when using a BinaryDecoder and a GenericDatumReader
just one time. This is due to the relatively expensive parsing and initialization that came
with 1.3. Patch with example code and a Perf harness to follow.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message