hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Nevill (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11660) Add support for hardware crc on ARM aarch64 architecture
Date Mon, 30 Mar 2015 15:24:54 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386859#comment-14386859
] 

Edward Nevill commented on HADOOP-11660:
----------------------------------------

Hi,

I have revised the patch to include the changes requested above. I have also updated test_bulk_crc32.c
so it prints out the times for 16384 bytes @ 512 bytes per checksum X 1000000 iterations for
both the Castagnoli and Zlib polynomials.

The following are the results I get for x86_64 before and after. I have done 5 runs of each.

BEFORE

{code}
[ed@mylittlepony hadoop]$ ./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.8
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ ./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.84
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ ./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.1
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.85
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ ./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.94
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ ./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.81
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
{code}

AFTER

{code}
[ed@mylittlepony hadoop]$ ./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.11
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.92
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ ./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.99
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ ./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.11
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.9
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ ./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.92
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ ./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.92
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
{code}

Loking at the average time over 5 runs gives

BEFORE

Castagnoli average = 1.116 sec
Zlib average = 13.848 sec

AFTER

Castagnoli average = 1.116
Zlib average = 13.93

So the performance for the Castagnoli polynomial is the same. For the Zlib poynomial there
seems to be a performance degradation of 0.6%. This may be due to experimental error, however
this is unaccelerated in any case on x86 because it is not supported on x86 HW and is not
used for HDFS.

For comparison, on aarch64 partner HW I get the following averages

Castagnoli = 3.586
Zlib = 3.580

Many thanks for you help with this,
Ed.


> Add support for hardware crc on ARM aarch64 architecture
> --------------------------------------------------------
>
>                 Key: HADOOP-11660
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11660
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 3.0.0
>         Environment: ARM aarch64 development platform
>            Reporter: Edward Nevill
>            Assignee: Edward Nevill
>            Priority: Minor
>              Labels: performance
>         Attachments: jira-11660.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> This patch adds support for hardware crc for ARM's new 64 bit architecture
> The patch is completely conditionalized on __aarch64__
> I have only added support for the non pipelined version as I benchmarked the pipelined
version on aarch64 and it showed no performance improvement.
> The aarch64 version supports both Castagnoli and Zlib CRCs as both of these are supported
on ARM aarch64 hardwre.
> To benchmark this I modified the test_bulk_crc32 test to print out the time taken to
CRC a 1MB dataset 1000 times.
> Before:
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
> After:
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
> So this represents a 5X performance improvement on raw CRC calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message