hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3981) Need a distributed file checksum algorithm for HDFS
Date Wed, 10 Sep 2008 17:55:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629886#action_12629886
] 

Raghu Angadi commented on HADOOP-3981:
--------------------------------------

> Why do you use the datanode's socket/opcode interface rather than adding a method to
ClientDatanodeProtocol?

Nicholas had briefly talk to me regd this. I was ok with either way. If RPCs are used, then
other RPCs on the port should be prepared to handle delays on the order of minutes, since
these checksum RPCs compete with the rest of the disk accesses. And there could be quite a
few these requests.

Datanode has just 3 RPC handlers.. we probably should not increase the handlers for this reason
since checksum load would be very rare and DataNode is thread starved already.


> Need a distributed file checksum algorithm for HDFS
> ---------------------------------------------------
>
>                 Key: HADOOP-3981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3981
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>         Attachments: 3981_20080909.patch
>
>
> Traditional message digest algorithms, like MD5, SHA1, etc., require reading the entire
input message sequentially in a central location.  HDFS supports large files with multiple
tera bytes.  The overhead of reading the entire file is huge. A distributed file checksum
algorithm is needed for HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message