Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Wed, 20 Jan 2016 02:33:39 +0000 (UTC)
From: "Jing Zhao (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12929984.1452725071000.153850.1453257219984@Atlassian.JIRA>
In-Reply-To: <JIRA.12929984.1452725071000@Atlassian.JIRA>
References: <JIRA.12929984.1452725071000@Atlassian.JIRA>
 <JIRA.12929984.1452725071200@arcas>
Subject: [jira] [Updated] (HDFS-9646) ErasureCodingWorker may fail when
 recovering data blocks with length less than the first internal block
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jing Zhao updated HDFS-9646:
----------------------------
    Attachment: HDFS-9646.004.patch

Update the patch to address Kai's comments about LOG.isDebugEnabled. The patch also updates {{TestRecoveryStripedFile}}: we corrupt the file only when we are generating failures for data blocks (i.e., {{DataOnly}}). This is because the current test checks if we can fix every error (including missing blocks and corrupted blocks). If we corrupt a parity block (e.g., blk_8), it is possible that the ErasureCodingWorker cannot detect this corruption (since it does not need to read blk_8 for recovery).

> ErasureCodingWorker may fail when recovering data blocks with length less than the first internal block
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9646
>                 URL: https://issues.apache.org/jira/browse/HDFS-9646
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0
>            Reporter: Takuya Fukudome
>            Assignee: Jing Zhao
>            Priority: Critical
>         Attachments: HDFS-9646.000.patch, HDFS-9646.001.patch, HDFS-9646.002.patch, HDFS-9646.003.patch, HDFS-9646.004.patch, test-reconstruct-stripe-file.patch
>
>
> This is reported by [~tfukudom]: ErasureCodingWorker may fail with the following exception when recovering a non-full internal block.
> {code}
> 2016-01-06 11:14:44,740 WARN  datanode.DataNode (ErasureCodingWorker.java:run(467)) - Failed to recover striped block: BP-987302662-172.29.4.13-1450757377698:blk_-92233720368
> 54322288_29751
> java.io.IOException: Transfer failed for all targets.
>         at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:455)
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)