hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Broberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8148) Zero-copy ByteBuffer-based compressor / decompressor API
Date Tue, 26 Jun 2012 22:12:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401723#comment-13401723
] 

Tim Broberg commented on HADOOP-8148:
-------------------------------------

Great to have some discussion on this! I was afraid it would be silence until the cement hardens
followed by shouting that it's all wrong.

Thoughts:

1 - Agreed. The bzip codec is a good example of this approach. The only real user of the compressors
apart from streams is the codec pool, which seems like kind of a hack to me. If we want to
pool the direct buffers, why don't we make direct buffer pools instead of compressor / decompressor
pools?
2 - Agreed. It feels very gzippy, and is unfriendly to block-based compressors.
3 - Less hassle, heck yes. More performant? That's surprising to me. Is that because of the
copies? Got benchmarks?
4 - Agreed.

I will attach my current idea of what the stream interface should look like. In this model,
the stream owns the direct memory buffers and pools them to reduce allocation overhead. This
allows the decompress stream to read ahead so that buffers are available to read instantly.
                
> Zero-copy ByteBuffer-based compressor / decompressor API
> --------------------------------------------------------
>
>                 Key: HADOOP-8148
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8148
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Tim Broberg
>            Assignee: Tim Broberg
>         Attachments: hadoop8148.patch
>
>
> Per Todd Lipcon's comment in HDFS-2834, "
>   Whenever a native decompression codec is being used, ... we generally have the following
copies:
>   1) Socket -> DirectByteBuffer (in SocketChannel implementation)
>   2) DirectByteBuffer -> byte[] (in SocketInputStream)
>   3) byte[] -> Native buffer (set up for decompression)
>   4*) decompression to a different native buffer (not really a copy - decompression necessarily
rewrites)
>   5) native buffer -> byte[]
>   with the proposed improvement we can hopefully eliminate #2,#3 for all applications,
and #2,#3,and #5 for libhdfs.
> "
> The interfaces in the attached patch attempt to address:
>  A - Compression and decompression based on ByteBuffers (HDFS-2834)
>  B - Zero-copy compression and decompression (HDFS-3051)
>  C - Provide the caller a way to know how the max space required to hold compressed output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message