hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8430) Erasure coding: update DFSClient.getFileChecksum() logic for stripe files
Date Mon, 04 Jan 2016 03:46:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080667#comment-15080667
] 

Kai Zheng commented on HDFS-8430:
---------------------------------

Thanks Nicholas for the correction. Yeah I misunderstood. It's smart to adjust the algorithm
in the replicated files side to conform with striped files. The impact might be big for existing
clusters because they will find their identical replicated files are not equal now. To avoid
the impact, how about adding a new API for the new behaviour? In the new approach, we would
need to introduce {{cell}} similar to striped files for replicated files when computing the
checksum? If so, how to determine it? When a replicated file is compared to a striped file,
I guess we can use the cell value used by the striped file for the replicated file. But then
the cell value needs to be passed into when calling {{getFileChecksum}}, which should be fine
if we introduce a new API.

I guess you want to use CRC64 to be collision-safer against CRC32 and make network traffic
smaller against MD5, {{64-bits x numCellsInOneBlock}} instead of {{16-bytes x numCellsInOneBlock}}.
Please help correct  if I don't get your point. Thanks.

> Erasure coding: update DFSClient.getFileChecksum() logic for stripe files
> -------------------------------------------------------------------------
>
>                 Key: HDFS-8430
>                 URL: https://issues.apache.org/jira/browse/HDFS-8430
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7285
>            Reporter: Walter Su
>            Assignee: Kai Zheng
>         Attachments: HDFS-8430-poc1.patch
>
>
> HADOOP-3981 introduces a  distributed file checksum algorithm. It's designed for replicated
block.
> {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped block group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message