hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4605) Implement block-size independent file checksum
Date Fri, 15 Mar 2013 21:02:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603843#comment-13603843
] 

Kihwal Lee commented on HDFS-4605:
----------------------------------

bq. If we do this, there will be a restriction that the summing hash function should consume
32 bits (CRC32 or CRC32C) at a time.

If this restriction leads to unacceptable collision rate, we could have datanode send back
remaining input to be prepended on the next node. But stock md5 (input size 512 bits) will
not work even with this, because it internally pads the end of input with the input length.
We need either a version of md5 that allows user to control the padding, or pick a different
hash algorithm.
                
> Implement block-size independent file checksum
> ----------------------------------------------
>
>                 Key: HDFS-4605
>                 URL: https://issues.apache.org/jira/browse/HDFS-4605
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Kihwal Lee
>
> The value of current getFileChecksum() is block-size dependent. Since FileChecksum is
mainly intended for comparing content of files, removing this dependency will make FileCheckum
in HDFS relevant in more use cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message