Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 478A8189DE for ; Wed, 20 Jan 2016 02:29:40 +0000 (UTC) Received: (qmail 72356 invoked by uid 500); 20 Jan 2016 02:29:40 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 72307 invoked by uid 500); 20 Jan 2016 02:29:40 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 72283 invoked by uid 99); 20 Jan 2016 02:29:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2016 02:29:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id DEDEE2C1F57 for ; Wed, 20 Jan 2016 02:29:39 +0000 (UTC) Date: Wed, 20 Jan 2016 02:29:39 +0000 (UTC) From: "Jing Zhao (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-9646) ErasureCodingWorker may fail when recovering data blocks with length less than the first internal block MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107850#comment-15107850 ] Jing Zhao commented on HDFS-9646: --------------------------------- Thanks for the review, Kai! bq. Wonder if it is or should, recoverying can also be triggered by corrupt case (the DN is live or not stopped). yes, recovery will also be triggered for corrupted blocks. However for this test we need a process to detect the corruption first. This can either be a client reading the data or a datanode recovering missing blocks. Here I want to make sure the DataNode can correctly detect and report the corruption during the recovery so we need to first generate at least one missing block by shutting down a DN. bq. Woner if we could share the following utility between client and datanode Yes, I planned to do so but could not find a good way for this small piece of logic. Maybe we can separate this into a different jira? > ErasureCodingWorker may fail when recovering data blocks with length less than the first internal block > ------------------------------------------------------------------------------------------------------- > > Key: HDFS-9646 > URL: https://issues.apache.org/jira/browse/HDFS-9646 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding > Affects Versions: 3.0.0 > Reporter: Takuya Fukudome > Assignee: Jing Zhao > Priority: Critical > Attachments: HDFS-9646.000.patch, HDFS-9646.001.patch, HDFS-9646.002.patch, HDFS-9646.003.patch, test-reconstruct-stripe-file.patch > > > This is reported by [~tfukudom]: ErasureCodingWorker may fail with the following exception when recovering a non-full internal block. > {code} > 2016-01-06 11:14:44,740 WARN datanode.DataNode (ErasureCodingWorker.java:run(467)) - Failed to recover striped block: BP-987302662-172.29.4.13-1450757377698:blk_-92233720368 > 54322288_29751 > java.io.IOException: Transfer failed for all targets. > at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:455) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)