hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10047) Allow Compressor/Decompressor APIs to expose a Direct ByteBuffer API
Date Thu, 17 Oct 2013 16:02:42 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798026#comment-13798026

Colin Patrick McCabe commented on HADOOP-10047:

bq. This allows the API to accept both direct & indirect buffers, which was convenient
for testing.

I don't think we should accept indirect buffers.  It forces the code to be much, much more
complex.  Especially the JNI code, which will have to have two totally separate code paths.

You can easily test with direct buffers, even without using JNI.  Just call {{ByteBuffer#allocateDirect}}.
 It's available in Java-- no native code required.  Since the interfaces are called {{DirectCompressor}}
/ {{DirectDecompressor}}, the buffers should be direct.

+import sun.util.logging.resources.logging;

Bad import

+   * @return bytes stored into dst
+   * @throws IOException if compression fails
+   */
+	public int compress(ByteBuffer dst, ByteBuffer src) throws IOException;

We shouldn't need to return an int.  Why not just check the ByteBuffer itself to see how many
bytes were written?

I agree with Chris-- we need to see a patch that uses this interface (for more than just a
test) before we can agree that it's the right one.  Performance improvement data would be
nice too.

I think it might actually be simpler to have {{ZlibDirectDecompressor}}, etc. be separate
classes from {{ZlibDecompressor}}.  Right now, there are a lot of confusing issues like what
happens if someone uses both API together?  If the compressors and decompressors are buffering
at all, issues like these can become problematic.  That also eliminates the casting.

> Allow Compressor/Decompressor APIs to expose a Direct ByteBuffer API
> --------------------------------------------------------------------
>                 Key: HADOOP-10047
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10047
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: compression
>         Attachments: DirectCompressor.html, DirectDecompressor.html, HADOOP-10047-WIP.patch,
> With the Zero-Copy reads in HDFS (HDFS-5260), it becomes important to perform all I/O
operations without copying data into byte[] buffers or other buffers which wrap over them.
> This is a proposal for adding new DirectCompressor and DirectDecompressor interfaces
to the io.compress, to indicate codecs which want to surface the direct buffer layer upwards.
> The implementation may or may not copy the buffers passed in, but should work with direct
heap/mmap buffers and cannot assume .array() availability.

This message was sent by Atlassian JIRA

View raw message