Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A5D3C200C3F for ; Wed, 8 Mar 2017 02:15:44 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A4679160B74; Wed, 8 Mar 2017 01:15:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ED0AA160B68 for ; Wed, 8 Mar 2017 02:15:43 +0100 (CET) Received: (qmail 788 invoked by uid 500); 8 Mar 2017 01:15:43 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 777 invoked by uid 99); 8 Mar 2017 01:15:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Mar 2017 01:15:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id AC0E1C141F for ; Wed, 8 Mar 2017 01:15:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.651 X-Spam-Level: X-Spam-Status: No, score=0.651 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Pq-K1Fj12P4b for ; Wed, 8 Mar 2017 01:15:42 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id D27855FB5B for ; Wed, 8 Mar 2017 01:15:41 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9972FE05AE for ; Wed, 8 Mar 2017 01:15:40 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id CEA2A24188 for ; Wed, 8 Mar 2017 01:15:38 +0000 (UTC) Date: Wed, 8 Mar 2017 01:15:38 +0000 (UTC) From: "Lukas Majercak (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11499) Decommissioning stuck because of failing recovery MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 08 Mar 2017 01:15:44 -0000 [ https://issues.apache.org/jira/browse/HDFS-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900519#comment-15900519 ] Lukas Majercak commented on HDFS-11499: --------------------------------------- Looks like the issue is in the configuration, I have been running this test on 2.7.1 with no problems, and just found that trunk is missing some configurations, specifically : conf.setInt(DFSConfigKeys.DFS_NAMENODE_REPLICATION_PENDING_TIMEOUT_SEC_KEY, 4). The test times out because of a block being in PendingReconstructionBlocks. > Decommissioning stuck because of failing recovery > ------------------------------------------------- > > Key: HDFS-11499 > URL: https://issues.apache.org/jira/browse/HDFS-11499 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode > Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha2 > Reporter: Lukas Majercak > Assignee: Lukas Majercak > Labels: blockmanagement, decommission, recovery > Fix For: 3.0.0-alpha3 > > Attachments: HDFS-11499.02.patch, HDFS-11499.03.patch, HDFS-11499.patch > > > Block recovery will fail to finalize the file if the locations of the last, incomplete block are being decommissioned. Vice versa, the decommissioning will be stuck, waiting for the last block to be completed. > {code:xml} > org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): Failed to finalize INodeFile testRecoveryFile since blocks[255] is non-complete, where blocks=[blk_1073741825_1001, blk_1073741826_1002... > {code} > The fix is to count replicas on decommissioning nodes when completing last block in BlockManager.commitOrCompleteLastBlock, as we know that the DecommissionManager will not decommission a node that has UC blocks. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org