Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <759186316.1221069344407.JavaMail.jira@brutus>
Date: Wed, 10 Sep 2008 10:55:44 -0700 (PDT)
From: "Raghu Angadi (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Commented: (HADOOP-3981) Need a distributed file checksum
 algorithm for HDFS
In-Reply-To: <124022105.1219255604558.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629886#action_12629886 ] 

Raghu Angadi commented on HADOOP-3981:
--------------------------------------

> Why do you use the datanode's socket/opcode interface rather than adding a method to ClientDatanodeProtocol?

Nicholas had briefly talk to me regd this. I was ok with either way. If RPCs are used, then other RPCs on the port should be prepared to handle delays on the order of minutes, since these checksum RPCs compete with the rest of the disk accesses. And there could be quite a few these requests.

Datanode has just 3 RPC handlers.. we probably should not increase the handlers for this reason since checksum load would be very rare and DataNode is thread starved already.


> Need a distributed file checksum algorithm for HDFS
> ---------------------------------------------------
>
>                 Key: HADOOP-3981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3981
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>         Attachments: 3981_20080909.patch
>
>
> Traditional message digest algorithms, like MD5, SHA1, etc., require reading the entire input message sequentially in a central location.  HDFS supports large files with multiple tera bytes.  The overhead of reading the entire file is huge. A distributed file checksum algorithm is needed for HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.