From Greg Hill
Subject Re: Ambari 2.0 DECOMMISSION
Date Thu, 14 May 2015 18:46:42 GMT
Some further testing results:

1. Turning on maintenance mode beforehand didn't seem to affect it.
2. The datanodes do go to decommissioning briefly before they go back to live, so it is at
least trying to decommission them.  Shouldn't they go to 'decommissioned' after it finishes
3. Some operation I'm doing (either stop host components or deleting host components) is causing
Ambari to automatically do a request like this for each node that's been decommissioned:
Remove host slave-6.local from exclude file
When that's done is when they get marked "dead" by the Namenode.

This worked fine in Ambari 1.7, so I'm guessing the "remove host from exclude file" thing
is what's breaking it as that's new.  Is there some way to disable that?  Can someone explain
the rationale behind it?  I'd like to be able to remove nodes without having to restart the


Did anything change with DECOMISSION in the 2.0 release?  The process appears to decommission
fine (the request completes and says it updated the dfs.exclude file), but the datanodes aren't
decommissioned and HDFS now says they're dead and I need to restart the Namenode.  For YARN,
the nodemanagers appear to have decommissioned ok and are in decommissioned status, but it
says I need to restart the resource manager (this didn't used to be the case in 1.7.0).

The only difference is that I don't set maintenance mode on the datanodes until after the
decommission completes, because that wasn't working for me at one point (turns out hitting
the API slightly differently would have made it work).  Is that the cause maybe?  Is restarting
the master services now required after a decommission?

Task output:

DataNode Decommission: slave-2.local,slave-4.local

2015-05-14 14:45:48,439 - u"File['/etc/hadoop/conf/dfs.exclude']" {'owner': 'hdfs', 'content':
Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
2015-05-14 14:45:48,670 - Writing u"File['/etc/hadoop/conf/dfs.exclude']" because contents
don't match
2015-05-14 14:45:48,864 - u"Execute['']" {'user': 'hdfs'}
2015-05-14 14:45:48,968 - u"ExecuteHadoop['dfsadmin -refreshNodes']" {'bin_dir': '/usr/hdp/current/hadoop-client/bin',
'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
2015-05-14 14:45:49,011 - u"Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes']"
{'logoutput': None, 'try_sleep': 0, 'environment': {}, 'tries': 1, 'user': 'hdfs', 'path':

DataNodes Status3 live / 2 dead / 0 decommissioning

NodeManager Decommission: slave-2.local,slave-4.local

2015-05-14 14:47:16,491 - u"File['/etc/hadoop/conf/yarn.exclude']" {'owner': 'yarn', 'content':
Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
2015-05-14 14:47:16,866 - Writing u"File['/etc/hadoop/conf/yarn.exclude']" because contents
don't match
2015-05-14 14:47:17,057 - u"Execute[' yarn --config /etc/hadoop/conf rmadmin -refreshNodes']"
{'environment': {'PATH': '/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin:/usr/hdp/current/hadoop-yarn-resourcemanager/bin'},
'user': 'yarn'}

NodeManagers Status 3 active / 0 lost / 0 unhealthy / 0 rebooted / 2 decommissioned

