Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 21C7A200C2E for ; Sun, 5 Mar 2017 21:12:38 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 185C5160B6B; Sun, 5 Mar 2017 20:12:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 60BE4160B57 for ; Sun, 5 Mar 2017 21:12:37 +0100 (CET) Received: (qmail 42863 invoked by uid 500); 5 Mar 2017 20:12:36 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 42852 invoked by uid 99); 5 Mar 2017 20:12:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Mar 2017 20:12:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id DFE55C0B7C for ; Sun, 5 Mar 2017 20:12:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.347 X-Spam-Level: X-Spam-Status: No, score=-2.347 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id uu-2p1FELrVw for ; Sun, 5 Mar 2017 20:12:35 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id BAEC45FB6B for ; Sun, 5 Mar 2017 20:12:34 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id CF8EFE05AE for ; Sun, 5 Mar 2017 20:12:33 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1112824170 for ; Sun, 5 Mar 2017 20:12:33 +0000 (UTC) Date: Sun, 5 Mar 2017 20:12:33 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11499) Decommissioning stuck because of failing recovery MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 05 Mar 2017 20:12:38 -0000 [ https://issues.apache.org/jira/browse/HDFS-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896532#comment-15896532 ] ASF GitHub Bot commented on HDFS-11499: --------------------------------------- GitHub user lukmajercak opened a pull request: https://github.com/apache/hadoop/pull/199 HDFS-11499 Decommissioning stuck because of failing recovery You can merge this pull request into a Git repository by running: $ git pull https://github.com/lukmajercak/hadoop HDFS-11499 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/199.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #199 ---- commit 3609b1353e64a24dee4746b8fa23ed7547768d68 Author: Lukas Majercak Date: 2017-03-05T20:04:06Z HDFS-11499 add TestDecommission.testDecommissionWithOpenFileAndDatanodeFailing for testing recovery commit 3f97d89f75d8a20f878da8c438141f9b6adf7da0 Author: Lukas Majercak Date: 2017-03-05T20:05:08Z HDFS-11499 count decommissioning replicas when completing last block in BlockManager.commitOrCompleteLastBlock ---- > Decommissioning stuck because of failing recovery > ------------------------------------------------- > > Key: HDFS-11499 > URL: https://issues.apache.org/jira/browse/HDFS-11499 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode > Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha2 > Reporter: Lukas Majercak > Assignee: Lukas Majercak > > Block recovery will fail to finalize the file if the locations of the last, incomplete block are being decommissioned. Vice versa, the decommissioning will be stuck, waiting for the last block to be completed. > {code:xml} > org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): Failed to finalize INodeFile testRecoveryFile since blocks[255] is non-complete, where blocks=[blk_1073741825_1001, blk_1073741826_1002... > {code} > The fix is to count replicas on decommissioning nodes when completing last block in BlockManager.commitOrCompleteLastBlock, as we know that the DecommissionManager will not decommission a node that has UC blocks. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org