hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo Nicholas Sze (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8430) Erasure coding: update DFSClient.getFileChecksum() logic for stripe files
Date Mon, 04 Jan 2016 09:41:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080894#comment-15080894
] 

Tsz Wo Nicholas Sze commented on HDFS-8430:
-------------------------------------------

> ... The impact might be big for existing clusters because they will find their identical
replicated files are not equal now. ...

After upgraded to the new algorithm, all checksums of replicated files will be computed by
the new algorithm so that identical replicated files will be considered equal under the new
algorithm.  However, comparing checksums between a new cluster with an old cluster is a problem.

> ... , how about adding a new API for the new behaviour?

It sounds good.  We may add a parameter to the getFileChecksum(..) methods for passing the
algorithm name.

> Erasure coding: update DFSClient.getFileChecksum() logic for stripe files
> -------------------------------------------------------------------------
>
>                 Key: HDFS-8430
>                 URL: https://issues.apache.org/jira/browse/HDFS-8430
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7285
>            Reporter: Walter Su
>            Assignee: Kai Zheng
>         Attachments: HDFS-8430-poc1.patch
>
>
> HADOOP-3981 introduces a  distributed file checksum algorithm. It's designed for replicated
block.
> {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped block group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message