hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8430) Erasure coding: compute file checksum for stripe files
Date Mon, 22 Feb 2016 04:31:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156467#comment-15156467
] 

Rakesh R commented on HDFS-8430:
--------------------------------

Its really an interesting work! Thank you [~drankye] and all others for the detailed thoughts.
I'm trying to understand more about the work, it would be great if you could give few clarifications
about the following points. Please excuse, if I'm asking questions that are already explained
in jira. Thanks!

{code}
For example, Consider there are two block groups for a striped file "file_1" -> bg0 and
bg1.
blockgroup0 => bg0_cell_00, bg0_cell_01, bg0_cell_02, bg0_cell_03, bg0_cell_04, bg0_cell_05
blockgroup0 => bg0_cell_10, bg0_cell_11, bg0_cell_12, bg0_cell_13, bg0_cell_14, bg0_cell_15
blockgroup0 => bg0_cell_20, bg0_cell_21, bg0_cell_22, bg0_cell_23, bg0_cell_24, bg0_cell_25

blockgroup1 => bg1_cell_00, bg1_cell_01, bg1_cell_02, bg1_cell_03, bg1_cell_04, bg1_cell_05
blockgroup1 => bg1_cell_10, bg1_cell_11, bg1_cell_12, bg1_cell_13, bg1_cell_14, bg1_cell_15
blockgroup1 => bg1_cell_20, bg1_cell_21, bg1_cell_22, bg1_cell_23, bg1_cell_24, bg1_cell_25
{code}

*Query1)* Does the proposal use pre-computed block checksum present in the block metadata
instead of re-calculating it again?

*Query2)* This question is continuation to the first one. Could you tell me the finalized/agreed
approach for the checksum computation. I could see two approaches for a striped file:
- Approach1:- do it in ROW wise, each stripe by stripe, get the pre-computed checksum values
bg0_cell_00, bg0_cell_01, bg0_cell_02, bg0_cell_03, bg0_cell_04, bg0_cell_05 or 
- Approach2:- do it in COLUMN wise, clubbing cells across the block group and then do computeChecksum(bg0_cell_00,
bg0_cell_10, bg0_cell_20).

*Query3)* How do the new algorithm applys to a contiguous file? Does it split the file into
smaller cells and each cell size could be of 64KB size?

*Query4)* I hope you are planning to use the current MD5MD5CRC32, isn't it?

> Erasure coding: compute file checksum for stripe files
> ------------------------------------------------------
>
>                 Key: HDFS-8430
>                 URL: https://issues.apache.org/jira/browse/HDFS-8430
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7285
>            Reporter: Walter Su
>            Assignee: Kai Zheng
>         Attachments: HDFS-8430-poc1.patch
>
>
> HADOOP-3981 introduces a  distributed file checksum algorithm. It's designed for replicated
block.
> {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped block group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message