hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9646) ErasureCodingWorker may fail when recovering data blocks with length less than the first internal block
Date Fri, 15 Jan 2016 02:19:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101065#comment-15101065
] 

Kai Zheng commented on HDFS-9646:
---------------------------------

Using {{random.nextInt}} to generate the datanode dead indexes is good for the fix here. On
the other hand, The randomness may still miss some boundary cases some time but not on other
chances, thus the test may be successful in most time but fail in other time. To help troubleshooting
in case the test fails in future, suggest the patch output the dead datanode list.

> ErasureCodingWorker may fail when recovering data blocks with length less than the first
internal block
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9646
>                 URL: https://issues.apache.org/jira/browse/HDFS-9646
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0
>            Reporter: Takuya Fukudome
>            Assignee: Jing Zhao
>            Priority: Critical
>         Attachments: HDFS-9646.000.patch, HDFS-9646.001.patch, test-reconstruct-stripe-file.patch
>
>
> This is reported by [~tfukudom]: ErasureCodingWorker may fail with the following exception
when recovering a non-full internal block.
> {code}
> 2016-01-06 11:14:44,740 WARN  datanode.DataNode (ErasureCodingWorker.java:run(467)) -
Failed to recover striped block: BP-987302662-172.29.4.13-1450757377698:blk_-92233720368
> 54322288_29751
> java.io.IOException: Transfer failed for all targets.
>         at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:455)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message