hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8148) Zero-copy ByteBuffer-based compressor / decompressor API
Date Tue, 26 Jun 2012 23:51:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401809#comment-13401809
] 

Todd Lipcon commented on HADOOP-8148:
-------------------------------------

bq. JNI is very problematic in several dimensions, but in terms of performance take a look
at https://issues.apache.org/jira/browse/HADOOP-6148

Sure, the pure Java approach was better than the crappy built-in JNI CRC. But then we switched
again back to JNI: HDFS-2080. With a _good_ implementation based on direct buffers, it's way
faster to go to C for most of this stuff, since you can take advantage of SSE.
                
> Zero-copy ByteBuffer-based compressor / decompressor API
> --------------------------------------------------------
>
>                 Key: HADOOP-8148
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8148
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Tim Broberg
>            Assignee: Tim Broberg
>         Attachments: hadoop8148.patch, zerocopyifc.tgz
>
>
> Per Todd Lipcon's comment in HDFS-2834, "
>   Whenever a native decompression codec is being used, ... we generally have the following
copies:
>   1) Socket -> DirectByteBuffer (in SocketChannel implementation)
>   2) DirectByteBuffer -> byte[] (in SocketInputStream)
>   3) byte[] -> Native buffer (set up for decompression)
>   4*) decompression to a different native buffer (not really a copy - decompression necessarily
rewrites)
>   5) native buffer -> byte[]
>   with the proposed improvement we can hopefully eliminate #2,#3 for all applications,
and #2,#3,and #5 for libhdfs.
> "
> The interfaces in the attached patch attempt to address:
>  A - Compression and decompression based on ByteBuffers (HDFS-2834)
>  B - Zero-copy compression and decompression (HDFS-3051)
>  C - Provide the caller a way to know how the max space required to hold compressed output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message