hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Su (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8430) Erasure coding: update DFSClient.getFileChecksum() logic for stripe files
Date Mon, 04 Jan 2016 04:04:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080672#comment-15080672

Walter Su commented on HDFS-8430:

bq. 1. Use CRC64 (or some other linear code) for block checksum instead of MD5.
Agreed. CRC works fine as hash function. Our purpose is file comparison. MD5 is overkill.
MD5 is 128bits, I think you mean CRC128?

bq. The datanode may compute cell CRC64s...
We may have many policies, and many cellSize. Let's say minimal cell size is 64k. You mean
calculate a CRC per 64k (instead of default value of _dfs.bytes-per-checksum_) ? It does reduce
network traffic. But I thought we could use the block metadata which already has the CRCs
and avoid re-calculation.

bq. Instead of sending all CRCs to the client, send all CRCs to one of the datanode in a block
Either way, we still need to fetch all CRCs from 6(or 9) DNs, and change the ordering. So
the hash value can be the same as replicated block.

bq. The hard part would be to consider the block missing, decoding and checksum computing

> Erasure coding: update DFSClient.getFileChecksum() logic for stripe files
> -------------------------------------------------------------------------
>                 Key: HDFS-8430
>                 URL: https://issues.apache.org/jira/browse/HDFS-8430
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7285
>            Reporter: Walter Su
>            Assignee: Kai Zheng
>         Attachments: HDFS-8430-poc1.patch
> HADOOP-3981 introduces a  distributed file checksum algorithm. It's designed for replicated
> {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped block group.

This message was sent by Atlassian JIRA

View raw message