hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3981) Need a distributed file checksum algorithm for HDFS
Date Thu, 11 Sep 2008 02:00:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tsz Wo (Nicholas), SZE updated HADOOP-3981:
-------------------------------------------

    Release Note: 
Implemented MD5-of-xxxMD5-of-yyyCRC32 which is a distributed file checksum algorithm for HDFS,
where xxx is the number of CRCs per block and yyy is the number of bytes per CRC.

Changed DistCp to use file checksum for comparing files if both source and destination FileSystem(s)
support getFileChecksum(...).

> Need a distributed file checksum algorithm for HDFS
> ---------------------------------------------------
>
>                 Key: HADOOP-3981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3981
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3981_20080909.patch, 3981_20080910.patch, 3981_20080910b.patch
>
>
> Traditional message digest algorithms, like MD5, SHA1, etc., require reading the entire
input message sequentially in a central location.  HDFS supports large files with multiple
tera bytes.  The overhead of reading the entire file is huge. A distributed file checksum
algorithm is needed for HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message