Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6A40811206 for ; Fri, 18 Jul 2014 05:21:11 +0000 (UTC) Received: (qmail 31495 invoked by uid 500); 18 Jul 2014 05:21:07 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 31015 invoked by uid 500); 18 Jul 2014 05:21:06 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 30855 invoked by uid 99); 18 Jul 2014 05:21:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jul 2014 05:21:06 +0000 Date: Fri, 18 Jul 2014 05:21:06 +0000 (UTC) From: "Allen Wittenauer (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (HDFS-134) premature end-of-decommission of datanodes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-134. ----------------------------------- Resolution: Fixed This has probably been fixed. Gonna close this as stale. > premature end-of-decommission of datanodes > ------------------------------------------ > > Key: HDFS-134 > URL: https://issues.apache.org/jira/browse/HDFS-134 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: dhruba borthakur > > Decommissioning requires that the nodes be listed in the dfs.hosts.excludes file. The administrator runs the "dfsadmin -refreshNodes" command. The decommissioning process starts off. Suppose that one of the datanodes that was being decommisioned has to re-register with the namenode. This can occur if the namenode restarts or if the datanode restarts while the decommissioning was in progress. Now, the namenode refuses to talk to this datanode because it is in the excludes list! This is a premature end of the decommissioning process. > One way to fix this bug is to make the namenode always accept registration requests, even for datanodes that are in the exclude list. The namenode, however, should set the "being decommissioned" flag for these datanodes. It should then re-start the decommisioning process for these datanodes. When the decommissioning is complete, the namenode will ask the datanodes to shutdown. -- This message was sent by Atlassian JIRA (v6.2#6252)