ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Roberts <srobe...@hortonworks.com>
Subject Re: decommission multiple nodes issue
Date Mon, 02 Mar 2015 19:22:38 GMT
Racker Greg - I’m not familiar with the decommissioning API, but if it’s consistent with
the rest of Ambari, you’ll need to change from this:

"excluded_hosts": “slave-1.local,slave-2.local"

To this:

"excluded_hosts" : [ "slave-1.local","slave-2.local" ]


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <greg.hill@rackspace.com><mailto:greg.hill@rackspace.com>
Reply: user@ambari.apache.org <user@ambari.apache.org>><mailto:user@ambari.apache.org>
Date: March 2, 2015 at 19:08:13
To: user@ambari.apache.org <user@ambari.apache.org>><mailto:user@ambari.apache.org>
Subject:  decommission multiple nodes issue

I have some code for decommissioning datanodes prior to removal.  It seems to work fine with
a single node, but with multiple nodes it fails.  When passing multiple hosts, I am putting
the names in a comma-separated string, as seems to be the custom with other Ambari API commands.
 I attempted to send it as a JSON array, but the server complained about that.  Let me know
if that is the wrong format.  The decommission request completes successfully, it just never
writes the excludes file so no nodes are decommissioned.

This fails for mutiple nodes:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local,slave-2.local"},
                "operation_level": {
“level”: “CLUSTER”,
“cluster_name”: cluster_name
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

But this works for a single node:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local"},
                "operation_level": {
“level”: “HOST_COMPONENT”,
“cluster_name”: cluster_name,
“host_name”: “slave-1.local”,
“service_name”: “HDFS”
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

Looking on the actual node, it’s obvious that the file isn’t being written by the command
output:

(multiple hosts, notice there is no ‘Writing File’ line)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'),
'group': 'hadoop'}
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin',
'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path':
['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

(single host, it writes the exclude file)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'),
'group': 'hadoop'}
Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin',
'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path':
['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

The only notable difference in the command.json is the commandParams/excluded_hosts param,
so it’s not like the request is passing the information along incorrectly.  I’m going
to play around with the format I use to pass it in and take some wild guesses like it’s
expecting double-encoded JSON as I’ve seen that in other places, but if someone knows the
answer offhand and can help out, that would be appreciated.  If it turns out to be a bug in
Ambari, I’ll open a JIRA and rewrite our code to issue the decommission call independently
for each host.

Greg
Mime
View raw message