hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Thomas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-6560) Byte array native checksumming on DN side
Date Fri, 20 Jun 2014 17:53:25 GMT

     [ https://issues.apache.org/jira/browse/HDFS-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

James Thomas updated HDFS-6560:
-------------------------------

    Attachment: HDFS-3528.patch

Ran some basic performance tests on a 10^8 byte data array. All listed times are for a single
call to verifyChunkedSums. Average over 20 runs.

Direct buffer with existing native implementation for direct buffers:
-Time for CRC32:  56.5 ms
-Time for CRC32C: 7.3 ms

Direct buffer with Java implementation:
-Time for CRC32: 81.8 ms
-Time for CRC32C: 82.5 ms

Byte array with native implementation developed in this patch:
-Time for CRC32: 55.0 ms
-Time for CRC32C: 7.63 ms

Byte array with Java implementation:
-Time for CRC32: 74.4 ms
-Time for CRC32C: 74.7 ms

So it seems like the native byte array implementation is essentially as fast as the direct
buffer equivalent.

Next, I ran a test on a single-node cluster (DN had 10 spinning disks) where I wrote a 1 GB
file (128 MB block size, all other cluster defaults in place). Averages over 20 runs:

Without change: 128.3 MB/s
With change: 128.4 MB/s

The difference here is not significant. This matches up with Trevor Robinson's results from
HDFS-3529 (he refactored write-side code to use direct buffers so that the direct buffer-based
native implementation could be used). He saw a significant performance improvement in a setup
with SSD drives, so I assume I would see a similar improvement here as well. Once there is
some discussion on HDFS-6561, I can try to implement client-side native checksumming and see
if that changes things.

> Byte array native checksumming on DN side
> -----------------------------------------
>
>                 Key: HDFS-6560
>                 URL: https://issues.apache.org/jira/browse/HDFS-6560
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, hdfs-client, performance
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-3528.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message