hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9646) ErasureCodingWorker may fail when recovering data blocks with length less than the first internal block
Date Wed, 20 Jan 2016 02:29:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107850#comment-15107850
] 

Jing Zhao commented on HDFS-9646:
---------------------------------

Thanks for the review, Kai!

bq. Wonder if it is or should, recoverying can also be triggered by corrupt case (the DN is
live or not stopped).

yes, recovery will also be triggered for corrupted blocks. However for this test we need a
process to detect the corruption first. This can either be a client reading the data or a
datanode recovering missing blocks. Here I want to make sure the DataNode can correctly detect
and report the corruption during the recovery so we need to first generate at least one missing
block by shutting down a DN.

bq. Woner if we could share the following utility between client and datanode

Yes, I planned to do so but could not find a good way for this small piece of logic. Maybe
we can separate this into a different jira?

> ErasureCodingWorker may fail when recovering data blocks with length less than the first
internal block
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9646
>                 URL: https://issues.apache.org/jira/browse/HDFS-9646
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0
>            Reporter: Takuya Fukudome
>            Assignee: Jing Zhao
>            Priority: Critical
>         Attachments: HDFS-9646.000.patch, HDFS-9646.001.patch, HDFS-9646.002.patch, HDFS-9646.003.patch,
test-reconstruct-stripe-file.patch
>
>
> This is reported by [~tfukudom]: ErasureCodingWorker may fail with the following exception
when recovering a non-full internal block.
> {code}
> 2016-01-06 11:14:44,740 WARN  datanode.DataNode (ErasureCodingWorker.java:run(467)) -
Failed to recover striped block: BP-987302662-172.29.4.13-1450757377698:blk_-92233720368
> 54322288_29751
> java.io.IOException: Transfer failed for all targets.
>         at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:455)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message