avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-753) Java: Improve BinaryEncoder Performance
Date Sun, 06 Feb 2011 20:01:33 GMT

    [ https://issues.apache.org/jira/browse/AVRO-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991207#comment-12991207

Scott Carey commented on AVRO-753:

Pursuing this further has led to new information, some questions, and some trouble.

* The old BinaryEncoder in most cases wrote directly to the output stream.  In some cases
it buffered (writeBytes).  Almost every use of it in Avro assumes that it does not buffer.
 Therefore, although we know from the mailing lists that many users have run into the buffering
and now use flush(), many likely do not.  Therefore we need something akin to "DirectBinaryEncoder",
and another big note in CHANGES.txt.  This should be much simpler than the Decoder case.
* BlockingBinaryEncoder should be easy to adapt, and integrate with the factory.  It should
become simpler than it is now.
* Does itt makes sense to have BinaryEncoder implement BufferedOutputStream?  And likewise
make "DirectBinaryEncoder" implement OutputStream?  This should then be easier for users to
understand the semantics and not have to keep a reference to the underlying stream around
to close.  Any use cases where one "weaves" avro and non-avro data to the same stream gets
much simpler too.

I have made a few more performance improvements, the big one is to writeString(String), which
goes from ~125MB/sec to ~183MB/sec.  The downside is that it requires an additional 50 lines
of code and a simpler, 5 line variation gets 160MB/sec.  This is the big one for the "thrift/protobuf
compare" performance benchmark. http://evanjones.ca/software/java-string-encoding.html
We could try adapting the raw UTF-8 code from the Hadoop project and see if that is faster.
 Perhaps for 1.5.0, we keep it simple and go with the 160MB/sec variant and research faster
string encoding and decoding on its own later.

> Java:  Improve BinaryEncoder Performance
> ----------------------------------------
>                 Key: AVRO-753
>                 URL: https://issues.apache.org/jira/browse/AVRO-753
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.5.0
>         Attachments: AVRO-753.v1.patch
> BinaryEncoder has not had a performance improvement pass like BinaryDecoder did.  It
still mostly writes directly to the underlying OutputStream which is not optimal for performance.
 I like to use a rule that if you are writing to an OutputStream or reading from an InputStream
in chunks smaller than 128 bytes, you have a performance problem.
> Measurements indicate that optimizing BinaryEncoder yields a 2.5x to 6x performance improvement.
 The process is significantly simpler than BinaryDecoder because 'pushing' is easier than
'pulling' -- and also because we do not need a 'direct' variant because BinaryEncoder already
buffers sometimes.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message