hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9818) Correctly handle EC reconstruction work caused by not enough racks
Date Thu, 18 Feb 2016 23:20:18 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jing Zhao updated HDFS-9818:
----------------------------
    Attachment: HDFS-9818.002.patch

Thanks for the review, Nicholas! Update the patch to address your comments and fix unit tests.

bq. Should we check all targets instead of the first target in validateReconstructionWork(..)?

When we call {{isInNewRack}} the block has enough replicas/internal blocks but not enough
racks. Thus only one additional target is scheduled.

> Correctly handle EC reconstruction work caused by not enough racks
> ------------------------------------------------------------------
>
>                 Key: HDFS-9818
>                 URL: https://issues.apache.org/jira/browse/HDFS-9818
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: Takuya Fukudome
>            Assignee: Jing Zhao
>         Attachments: HDFS-9818.000.patch, HDFS-9818.001.patch, HDFS-9818.002.patch
>
>
> This is reported by [~tfukudom]:
> In a system test where 1 of 7 datanode racks were stopped, {{HadoopIllegalArgumentException}}
was seen on DataNode side while reconstructing missing EC blocks:
> {code}
> 2016-02-16 11:09:06,672 WARN  datanode.DataNode (ErasureCodingWorker.java:run(482)) -
Failed to recover striped block: BP-480558282-172.29.4.13-1453805190696:blk_-9223372036850962784_278270
> org.apache.hadoop.HadoopIllegalArgumentException: Inputs not fully corresponding to erasedIndexes
in null places. erasedOrNotToReadIndexes: [1, 2, 6], erasedIndexes: [3]
> 	at org.apache.hadoop.io.erasurecode.rawcoder.RSRawDecoder.doDecode(RSRawDecoder.java:166)
> 	at org.apache.hadoop.io.erasurecode.rawcoder.AbstractRawErasureDecoder.decode(AbstractRawErasureDecoder.java:84)
> 	at org.apache.hadoop.io.erasurecode.rawcoder.RSRawDecoder.decode(RSRawDecoder.java:89)
> 	at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.recoverTargets(ErasureCodingWorker.java:683)
> 	at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:465)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message