Return-Path: X-Original-To: apmail-ambari-dev-archive@www.apache.org Delivered-To: apmail-ambari-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D13EB17C7A for ; Mon, 11 May 2015 17:47:01 +0000 (UTC) Received: (qmail 87130 invoked by uid 500); 11 May 2015 17:47:01 -0000 Delivered-To: apmail-ambari-dev-archive@ambari.apache.org Received: (qmail 87104 invoked by uid 500); 11 May 2015 17:47:01 -0000 Mailing-List: contact dev-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list dev@ambari.apache.org Received: (qmail 86861 invoked by uid 99); 11 May 2015 17:47:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 May 2015 17:47:01 +0000 Date: Mon, 11 May 2015 17:47:01 +0000 (UTC) From: "Greg Hill (JIRA)" To: dev@ambari.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (AMBARI-9902) Decommission DATANODE silently fails if in maintenance mode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AMBARI-9902?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hill resolved AMBARI-9902. ------------------------------- Resolution: Invalid I needed to set the operation_level differently for this to work properly. > Decommission DATANODE silently fails if in maintenance mode > ----------------------------------------------------------- > > Key: AMBARI-9902 > URL: https://issues.apache.org/jira/browse/AMBARI-9902 > Project: Ambari > Issue Type: Bug > Components: ambari-agent > Affects Versions: 1.7.0 > Reporter: Greg Hill > > If you set maintenance mode on multiple hosts, then attempt to decommissi= on the DATANODE on those hosts, it says that it succeeded but it did not ac= tually decommission any nodes in HDFS. This can lead to data loss as the c= ustomer might assume that it's safe to remove those hosts from the pool. > The request looks like: > {noformat} > "RequestInfo": { > "command": "DECOMMISSION", > "context": "Decommission DataNode=E2=80=9D, > "parameters": {"slave_type": =E2=80=9CDATANODE", "exclude= d_hosts": =E2=80=9Cslave-3.local,slave-1.local"}, > "operation_level": { > =E2=80=9Clevel=E2=80=9D: =E2=80=9CCLUSTER=E2=80=9D, > =E2=80=9Ccluster_name=E2=80=9D: cluster_name > }, > }, > "Requests/resource_filters": [{ > "service_name": =E2=80=9CHDFS", > "component_name": =E2=80=9CNAMENODE", > }], > {noformat} > The task output appears to work: > {noformat} > File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Templat= e('exclude_hosts_list.j2'), 'group': 'hadoop'} > Execute[''] {'user': 'hdfs'} > ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/had= oop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'u= ser': 'hdfs'} > Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logo= utput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, = 'user': 'hdfs', 'try_sleep': 0} > {noformat} > But it didn't actually write any contents to the file. If it had, this l= ine would have been in there: > {noformat} > Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match > {noformat} > The command json file for the task has the right hosts list as a paramete= r: > {noformat} > "commandParams": { > "service_package_folder": "HDP/2.0.6/services/HDFS/package", > "update_exclude_file_only": "false", > "script": "scripts/namenode.py", > "hooks_folder": "HDP/2.0.6/hooks", > "excluded_hosts": "slave-3.local,slave-1.local", > "command_timeout": "600", > "slave_type": "DATANODE", > "script_type": "PYTHON" > }, > {noformat} > So something is filtering the list external to that. > If maintenance mode was not set, everything works as expected. I don't b= elieve there's a legitimate reason to disallow decommissioning nodes in mai= ntenance mode, as that seems to be the expected course of action (set maint= enance, decommission, remove) for dealing with a problematic host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)