hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6560) Byte array native checksumming on DN side
Date Fri, 11 Jul 2014 16:54:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058997#comment-14058997
] 

Todd Lipcon commented on HDFS-6560:
-----------------------------------

Nice benchmark results. Worth noting that your results here also include the startup/shutdown
time of the DNs and client, right? In that case, the 10% reduction in cycles is an under-estimate.
You could quantify this by getting the perf results for a put of a 1GB file, as well as the
results for a 2GB file, and subtracting. That would give you a better indication of the savings
in the "steady state" (once all JIT has kicked in, etc).

Nicholas -- for the write path, wouldn't we nearly always be writing with CRC32C? The only
case we'd be writing with the old checksum format is distcping from Hadoop 1.0 clusters or
appending to files which have existed since Hadoop 1.0. Both are far less common. So, this
change is a positive change in that it would increase the performance a lot for the common
case (and maybe only decrease a couple percent for the uncommon case). The comparative results
for CRC32 (zlib polynomial) seem a bit orthogonal to this JIRA.

> Byte array native checksumming on DN side
> -----------------------------------------
>
>                 Key: HDFS-6560
>                 URL: https://issues.apache.org/jira/browse/HDFS-6560
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, hdfs-client, performance
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-3528.patch, HDFS-6560.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message