hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10460) Erasure Coding: Recompute block checksum for a particular range less than file size on the fly by reconstructing missed block
Date Tue, 21 Jun 2016 15:19:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341960#comment-15341960
] 

Rakesh R commented on HDFS-10460:
---------------------------------

Thanks [~drankye] for the detailed explanation. I have analysed this approach. I could see
its little tricky logic.

We have two cases:

case-1) Say all DNs are working fine and no failure. For calculating the checksum it needs
{{requestedNumBytes}} and this is used to build the exact block length from the {{blockGroup}}.
At the beginning, it is setting {{block.setNumBytes(getRemaining())}} the requestedNumBytes
here and which will inturn passed to the below logic to construct the block with required
number of bytes. If we leave the numBytes unchanged then this logic will return wrong number
of bytes for reading the checksum data.
{code}
ExtendedBlock block = StripedBlockUtil.constructInternalBlock(
              blockGroup, ecPolicy.getCellSize(), numDataUnits, idx);
{code}

case-2) With few DN failures. For reconstructing the block it needs the {{actualNumBytes}}
and then recalculate the requestedNumBytes checksum data.
{code}
      ExtendedBlock reconBlockGroup = new ExtendedBlock(blockGroup);
      reconBlockGroup.setNumBytes(actualNumBytes);
{code}

What I'm trying to explain is, 
- in case-1: it needs {{blockGroup}} object with {{requestedNumBytes}}
- in case-2: it needs {{reconBlockGroup}} object with {{requestedNumBytes}}
So in either way there is a need of dummy object with requestedNumBytes. 

IMHO, can continue setting {{block.setNumBytes(getRemaining());}} logic in Replicated and
Striped block. Then will consider reconstruction as a special case and will create {{reconBlockGroup}}
object with actualNumBytes, like I'm doing in the current patch. Whats your opinion?

> Erasure Coding: Recompute block checksum for a particular range less than file size on
the fly by reconstructing missed block
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10460
>                 URL: https://issues.apache.org/jira/browse/HDFS-10460
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: HDFS-10460-00.patch, HDFS-10460-01.patch
>
>
> This jira is HDFS-9833 follow-on task to address reconstructing block and then recalculating
block checksum for a particular range query.
> For example,
> {code}
> // create a file 'stripedFile1' with fileSize = cellSize * numDataBlocks = 65536 * 6
= 393216
> FileChecksum stripedFileChecksum = getFileChecksum(stripedFile1, 10, true);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message