hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9601) Support native CRC on byte arrays
Date Thu, 06 Jun 2013 20:32:21 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gopal V updated HADOOP-9601:
----------------------------

    Attachment: HADOOP-9601-bench.patch

The bottleneck for -put does not seems to be verify checksums, but calculateChunkedSums on
the client side, which doesn't have a native equiv in NativeCrc32.c

I wrote a micro-benchmark, which shows the array buffers are now the same speed as the direct
buffers, with the patch.

Before

{code}
Checksumming CRC32+array: 32768 MB took 35944 ms (911.64 MB/s)
Checksumming CRC32C+array: 32768 MB took 35517 ms (922.60 MB/s)
Checksumming CRC32+direct: 32768 MB took 24318 ms (1347.48 MB/s)
Checksumming CRC32C+direct: 32768 MB took 13229 ms (2476.98 MB/s)
{code}

After

{code}
Checksumming CRC32+array: 32768 MB took 24399 ms (1343.01 MB/s)
Checksumming CRC32C+array: 32768 MB took 13238 ms (2475.30 MB/s)
Checksumming CRC32+direct: 32768 MB took 25190 ms (1300.83 MB/s)
Checksumming CRC32C+direct: 32768 MB took 13075 ms (2506.16 MB/s)
{code}
                
> Support native CRC on byte arrays
> ---------------------------------
>
>                 Key: HADOOP-9601
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9601
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: performance, util
>    Affects Versions: 3.0.0
>            Reporter: Todd Lipcon
>            Assignee: Gopal V
>              Labels: perfomance
>         Attachments: HADOOP-9601-bench.patch, HADOOP-9601-trunk-rebase-2.patch, HADOOP-9601-trunk-rebase.patch,
HADOOP-9601-WIP-01.patch, HADOOP-9601-WIP-02.patch
>
>
> When we first implemented the Native CRC code, we only did so for direct byte buffers,
because these correspond directly to native heap memory and thus make it easy to access via
JNI. We'd generally assumed that accessing byte[] arrays from JNI was not efficient enough,
but now that I know more about JNI I don't think that's true -- we just need to make sure
that the critical sections where we lock the buffers are short.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message