Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Wed, 20 Jan 2016 02:29:39 +0000 (UTC)
From: "Jing Zhao (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12929984.1452725071000.153810.1453256979910@Atlassian.JIRA>
In-Reply-To: <JIRA.12929984.1452725071000@Atlassian.JIRA>
References: <JIRA.12929984.1452725071000@Atlassian.JIRA>
 <JIRA.12929984.1452725071200@arcas>
Subject: [jira] [Commented] (HDFS-9646) ErasureCodingWorker may fail when
 recovering data blocks with length less than the first internal block
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107850#comment-15107850 ] 

Jing Zhao commented on HDFS-9646:
---------------------------------

Thanks for the review, Kai!

bq. Wonder if it is or should, recoverying can also be triggered by corrupt case (the DN is live or not stopped).

yes, recovery will also be triggered for corrupted blocks. However for this test we need a process to detect the corruption first. This can either be a client reading the data or a datanode recovering missing blocks. Here I want to make sure the DataNode can correctly detect and report the corruption during the recovery so we need to first generate at least one missing block by shutting down a DN.

bq. Woner if we could share the following utility between client and datanode

Yes, I planned to do so but could not find a good way for this small piece of logic. Maybe we can separate this into a different jira?

> ErasureCodingWorker may fail when recovering data blocks with length less than the first internal block
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9646
>                 URL: https://issues.apache.org/jira/browse/HDFS-9646
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0
>            Reporter: Takuya Fukudome
>            Assignee: Jing Zhao
>            Priority: Critical
>         Attachments: HDFS-9646.000.patch, HDFS-9646.001.patch, HDFS-9646.002.patch, HDFS-9646.003.patch, test-reconstruct-stripe-file.patch
>
>
> This is reported by [~tfukudom]: ErasureCodingWorker may fail with the following exception when recovering a non-full internal block.
> {code}
> 2016-01-06 11:14:44,740 WARN  datanode.DataNode (ErasureCodingWorker.java:run(467)) - Failed to recover striped block: BP-987302662-172.29.4.13-1450757377698:blk_-92233720368
> 54322288_29751
> java.io.IOException: Transfer failed for all targets.
>         at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:455)
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)