avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-2052) Remove org.apache.avro.file.DataFileWriter Double Buffering
Date Tue, 18 Jul 2017 19:46:01 GMT

    [ https://issues.apache.org/jira/browse/AVRO-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092067#comment-16092067
] 

Doug Cutting commented on AVRO-2052:
------------------------------------

There may be significant performance differences between a BufferedBinaryEncoder and a DirectBinaryEncoder
writing to a buffered output stream.  Ints, longs, doubles and floats are all buffered internally
in DirectBinaryEncoder, so removing the BufferedBinaryEncoder's buffering doesn't in fact
reduce the number of bytes copied for these types but rather increases the number of invocations
of the byte copier.  This was deemed significant in the past, but is perhaps worth re-benchmarking.
 Perf.java (in ipc/.../io) could be used for this.  This doesn't likely matter for vout, but
may be significant for bufOut.  This shouldn't be committed without such benchmarking.

> Remove org.apache.avro.file.DataFileWriter Double Buffering
> -----------------------------------------------------------
>
>                 Key: AVRO-2052
>                 URL: https://issues.apache.org/jira/browse/AVRO-2052
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.7.7, 1.8.2
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Trivial
>         Attachments: AVRO-2052.1.patch
>
>
> {code:title=org.apache.avro.file.DataFileWriter}
>   private void init(OutputStream outs) throws IOException {
>     this.underlyingStream = outs;
>     this.out = new BufferedFileOutputStream(outs);
>     EncoderFactory efactory = new EncoderFactory();
>     this.vout = efactory.binaryEncoder(out, null);
>     dout.setSchema(schema);
>     buffer = new NonCopyingByteArrayOutputStream(
>         Math.min((int)(syncInterval * 1.25), Integer.MAX_VALUE/2 -1));
>     this.bufOut = efactory.binaryEncoder(buffer, null);
>     if (this.codec == null) {
>       this.codec = CodecFactory.nullCodec().createInstance();
>     }
>     this.isOpen = true;
>   }
> {code}
> It's clear here that both streams are writing to a buffered destination, {{ BufferedFileOutputStream}}
and {{ByteArrayOutputStream}} therefore there is no reason to need a buffered encoder and
instead, write directly to the buffered streams with {{directBinaryEncoder}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message