Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Thu, 18 Feb 2016 23:20:18 +0000 (UTC)
From: "Jing Zhao (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12939672.1455672167000.88168.1455837618189@Atlassian.JIRA>
In-Reply-To: <JIRA.12939672.1455672167000@Atlassian.JIRA>
References: <JIRA.12939672.1455672167000@Atlassian.JIRA>
 <JIRA.12939672.1455672167319@arcas>
Subject: [jira] [Updated] (HDFS-9818) Correctly handle EC reconstruction
 work caused by not enough racks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jing Zhao updated HDFS-9818:
----------------------------
    Attachment: HDFS-9818.002.patch

Thanks for the review, Nicholas! Update the patch to address your comments and fix unit tests.

bq. Should we check all targets instead of the first target in validateReconstructionWork(..)?

When we call {{isInNewRack}} the block has enough replicas/internal blocks but not enough racks. Thus only one additional target is scheduled.

> Correctly handle EC reconstruction work caused by not enough racks
> ------------------------------------------------------------------
>
>                 Key: HDFS-9818
>                 URL: https://issues.apache.org/jira/browse/HDFS-9818
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: Takuya Fukudome
>            Assignee: Jing Zhao
>         Attachments: HDFS-9818.000.patch, HDFS-9818.001.patch, HDFS-9818.002.patch
>
>
> This is reported by [~tfukudom]:
> In a system test where 1 of 7 datanode racks were stopped, {{HadoopIllegalArgumentException}} was seen on DataNode side while reconstructing missing EC blocks:
> {code}
> 2016-02-16 11:09:06,672 WARN  datanode.DataNode (ErasureCodingWorker.java:run(482)) - Failed to recover striped block: BP-480558282-172.29.4.13-1453805190696:blk_-9223372036850962784_278270
> org.apache.hadoop.HadoopIllegalArgumentException: Inputs not fully corresponding to erasedIndexes in null places. erasedOrNotToReadIndexes: [1, 2, 6], erasedIndexes: [3]
> 	at org.apache.hadoop.io.erasurecode.rawcoder.RSRawDecoder.doDecode(RSRawDecoder.java:166)
> 	at org.apache.hadoop.io.erasurecode.rawcoder.AbstractRawErasureDecoder.decode(AbstractRawErasureDecoder.java:84)
> 	at org.apache.hadoop.io.erasurecode.rawcoder.RSRawDecoder.decode(RSRawDecoder.java:89)
> 	at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.recoverTargets(ErasureCodingWorker.java:683)
> 	at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:465)
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)