hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-7445) Implement bulk checksum verification using efficient native code
Date Wed, 03 Aug 2011 19:58:26 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon updated HADOOP-7445:

    Attachment: hadoop-7445.txt

Good point that we don't need the special license on the tables, since we generated them using
your Table class. But, the actual "slicing-by-8" implementation is from a project with BSD
license. So, I moved that special license header to bulk_crc32.c.

This new revision also rebases on the mavenized common.

As for testing performance and correctness against the existing implementation:
- Performance wise, we don't currently have a canned benchmark for testing performance of
checksum _verification_. This patch doesn't currently add native checksum _computation_ anywhere,
since the umbrella JIRA HDFS-2080 is focusing on the read path. I was able to run benchmarks
of "hadoop fs -cat /dev/shm/128M /dev/shm/128M /dev/shm/128M [repeated 50 times]" using a
ChecksumFileSystem, and saw ~60% speed improvement. This is a measurement of CPU overhead,
since it's reading from a file in  a RAM disk.
- Correctness wise, the new test cases in TestDataChecksum verify both the native and non-native
code, since they test with direct buffers as well as heap buffers that wrap a byte[]. If the
native and non-native code disagreed, then this test would fail for one of the two cases (since
the computed checksums are always computed by the java code)

> Implement bulk checksum verification using efficient native code
> ----------------------------------------------------------------
>                 Key: HADOOP-7445
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7445
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native, util
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-7445.txt, hadoop-7445.txt, hadoop-7445.txt, hadoop-7445.txt,
> Once HADOOP-7444 is implemented ("bulk" API for checksums), good performance gains can
be had by implementing bulk checksum operations using JNI. This JIRA is to add checksum support
to the native libraries. Of course if native libs are not available, it will still fall back
to the pure-Java implementations.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message