Return-Path: X-Original-To: apmail-ambari-user-archive@www.apache.org Delivered-To: apmail-ambari-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A5B6617F21 for ; Wed, 4 Mar 2015 04:50:29 +0000 (UTC) Received: (qmail 71262 invoked by uid 500); 4 Mar 2015 04:50:15 -0000 Delivered-To: apmail-ambari-user-archive@ambari.apache.org Received: (qmail 71229 invoked by uid 500); 4 Mar 2015 04:50:15 -0000 Mailing-List: contact user-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ambari.apache.org Delivered-To: mailing list user@ambari.apache.org Received: (qmail 71219 invoked by uid 99); 4 Mar 2015 04:50:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Mar 2015 04:50:15 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=FSL_HELO_BARE_IP_2,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yusaku@hortonworks.com designates 64.78.52.187 as permitted sender) Received: from [64.78.52.187] (HELO relayvx12c.securemail.intermedia.net) (64.78.52.187) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Mar 2015 04:50:10 +0000 Received: from securemail.intermedia.net (localhost [127.0.0.1]) by emg-ca-1-2.localdomain (Postfix) with ESMTP id A86C953E88 for ; Tue, 3 Mar 2015 20:49:49 -0800 (PST) Subject: Re: decommission multiple nodes issue MIME-Version: 1.0 x-echoworx-emg-received: Tue, 3 Mar 2015 20:49:49.668 -0800 x-echoworx-msg-id: ea243062-c404-40a1-9df3-0d58a2ee6e3f x-echoworx-action: delivered Received: from 10.254.155.17 ([10.254.155.17]) by emg-ca-1-2 (JAMES SMTP Server 2.3.2) with SMTP ID 1016 for ; Tue, 3 Mar 2015 20:49:49 -0800 (PST) Received: from MBX080-W5-CO-2.exch080.serverpod.net (unknown [10.224.117.104]) by emg-ca-1-2.localdomain (Postfix) with ESMTP id 607E653E88 for ; Tue, 3 Mar 2015 20:49:49 -0800 (PST) Received: from MBX080-W4-CO-2.exch080.serverpod.net (10.224.117.102) by MBX080-W5-CO-2.exch080.serverpod.net (10.224.117.104) with Microsoft SMTP Server (TLS) id 15.0.1044.25; Tue, 3 Mar 2015 20:49:47 -0800 Received: from MBX080-W4-CO-2.exch080.serverpod.net ([10.224.117.102]) by mbx080-w4-co-2.exch080.serverpod.net ([10.224.117.102]) with mapi id 15.00.1044.021; Tue, 3 Mar 2015 20:49:47 -0800 From: Yusaku Sako To: "user@ambari.apache.org" , Sean Roberts Thread-Topic: decommission multiple nodes issue Thread-Index: AQHQVRwpMNzCg9Z9ik67u3EOCI8rx50KGFiA//98nICAAIa2AIAA6j0AgACqOgCAABLpgA== Date: Wed, 4 Mar 2015 04:49:41 +0000 Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [73.202.80.178] x-source-routing-agent: Processed Content-Type: multipart/alternative; boundary="_000_D11BCEC5556B4yusakuhortonworkscom_" X-Virus-Checked: Checked by ClamAV on apache.org --_000_D11BCEC5556B4yusakuhortonworkscom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable BTW, I've started a new Wiki on decommissioning DataNodes: https://cwiki.ap= ache.org/confluence/display/AMBARI/API+to+decommission+DataNodes Yusaku From: Yusaku Sako > Reply-To: "user@ambari.apache.org" > Date: Tuesday, March 3, 2015 7:41 PM To: "user@ambari.apache.org" >, Sean Roberts > Subject: Re: decommission multiple nodes issue Hi Greg, This is actually by design. If you want to decommission all DataNodes regardless of their host maintena= nce mode, you need to change "RequestInfo/level" from "CLUSTER" to "HOST_CO= MPONENT". When you set the "level" to "CLUSTER", bulk operations (in this case decomm= ission) would be skipped on the matching target resources in case the host(= s) are in maintenance mode. If you set to "HOST_COMPONENT", it would ignore any host-level maintenance = mode. This is a really mysterious, undocumented part of Ambari, unfortunately. Yusaku From: Greg Hill > Reply-To: "user@ambari.apache.org" > Date: Tuesday, March 3, 2015 9:32 AM To: Sean Roberts = >, "user@ambari.apache.org" > Subject: Re: decommission multiple nodes issue I have verified that if maintenance mode is set on a host, then it is ignor= ed by the decommission process, but only if you try to decommission multipl= e hosts at the same time. I'll open a bug. Greg From: Sean Roberts > Date: Monday, March 2, 2015 at 1:34 PM To: Greg >, "user@a= mbari.apache.org" > Subject: Re: decommission multiple nodes issue Greg - Same here on submitting JSON. Although they are JSON documents you h= ave to submit them as plain form. This is true across all of Ambari. I open= ed a bug for it a month back. -- Hortonworks - We do Hadoop Sean Roberts Partner Solutions Engineer - EMEA @seano From: Greg Hill Date: March 2, 2015 at 19:32:34 To: Sean Roberts >, user@ambari.apache.org> Subject: Re: decommission multiple nodes issue That causes a server error. I=92ve yet to see any part of the API that acc= epts JSON arrays like that as input; it=92s almost always, if not always, a= comma-separated string like I posted. Many methods even return double-enc= oded JSON values (i.e. =93key=94: =93[\=94value1\=94,\=94value2\=94]"). It= =92s kind of annoying and inconsistent, honestly, and not documented anywhe= re. You just have to have your client code choke on it and then go add ano= ther data[key] =3D json.loads(data[key]) in the client to account for it. I am starting to think it=92s because I set the nodes into maintenance mode= first, as doing the decommission command manually from the client works fi= ne when the nodes aren=92t in maintenance mode. I=92ll keep digging, I gue= ss, but it is weird that the exact same command worked this time (the comma= ndArgs are identical to the one that did nothing). Greg From: Sean Roberts > Date: Monday, March 2, 2015 at 1:22 PM To: Greg >, "user@a= mbari.apache.org" > Subject: Re: decommission multiple nodes issue Racker Greg - I=92m not familiar with the decommissioning API, but if it=92= s consistent with the rest of Ambari, you=92ll need to change from this: "excluded_hosts": =93slave-1.local,slave-2.local" To this: "excluded_hosts" : [ "slave-1.local","slave-2.local" ] -- Hortonworks - We do Hadoop Sean Roberts Partner Solutions Engineer - EMEA @seano From: Greg Hill Reply: user@ambari.apache.org> Date: March 2, 2015 at 19:08:13 To: user@ambari.apache.org> Subject: decommission multiple nodes issue I have some code for decommissioning datanodes prior to removal. It seems = to work fine with a single node, but with multiple nodes it fails. When pa= ssing multiple hosts, I am putting the names in a comma-separated string, a= s seems to be the custom with other Ambari API commands. I attempted to se= nd it as a JSON array, but the server complained about that. Let me know i= f that is the wrong format. The decommission request completes successfull= y, it just never writes the excludes file so no nodes are decommissioned. This fails for mutiple nodes: "RequestInfo": { "command": "DECOMMISSION", "context": "Decommission DataNode=94), "parameters": {"slave_type": =93DATANODE", "excluded_hosts"= : =93slave-1.local,slave-2.local"}, "operation_level": { =93level=94: =93CLUSTER=94, =93cluster_name=94: cluster_name }, }, "Requests/resource_filters": [{ "service_name": =93HDFS", "component_name": =93NAMENODE", }], But this works for a single node: "RequestInfo": { "command": "DECOMMISSION", "context": "Decommission DataNode=94), "parameters": {"slave_type": =93DATANODE", "excluded_hosts"= : =93slave-1.local"}, "operation_level": { =93level=94: =93HOST_COMPONENT=94, =93cluster_name=94: cluster_name, =93host_name=94: =93slave-1.local=94, =93service_name=94: =93HDFS=94 }, }, "Requests/resource_filters": [{ "service_name": =93HDFS", "component_name": =93NAMENODE", }], Looking on the actual node, it=92s obvious that the file isn=92t being writ= ten by the command output: (multiple hosts, notice there is no =91Writing File=92 line) File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template(= 'exclude_hosts_list.j2'), 'group': 'hadoop'} Execute[''] {'user': 'hdfs'} ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoo= p-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'use= r': 'hdfs'} Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logout= put': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'u= ser': 'hdfs', 'try_sleep': 0} (single host, it writes the exclude file) File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template(= 'exclude_hosts_list.j2'), 'group': 'hadoop'} Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match Execute[''] {'user': 'hdfs'} ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoo= p-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'use= r': 'hdfs'} Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logout= put': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'u= ser': 'hdfs', 'try_sleep': 0} The only notable difference in the command.json is the commandParams/exclud= ed_hosts param, so it=92s not like the request is passing the information a= long incorrectly. I=92m going to play around with the format I use to pass= it in and take some wild guesses like it=92s expecting double-encoded JSON= as I=92ve seen that in other places, but if someone knows the answer offha= nd and can help out, that would be appreciated. If it turns out to be a bu= g in Ambari, I=92ll open a JIRA and rewrite our code to issue the decommiss= ion call independently for each host. Greg --_000_D11BCEC5556B4yusakuhortonworkscom_ Content-Type: text/html; charset="Windows-1252" Content-ID: <25F12839DF721349BA45D3D5D4B451CD@exch080.serverpod.net> Content-Transfer-Encoding: quoted-printable
BTW, I've started a new Wiki on decommissioning DataNodes: https://cwiki.apache.org/confluence/display/AMBARI/= API+to+decommission+DataNodes

Yusaku

From: Yusaku Sako <yusaku@hortonworks.com>
Reply-To: "user@ambari.apache.org" <user@ambari.apache.org>
Date: Tuesday, March 3, 2015 7:41 P= M
To: "user@ambari.apache.org" <user@ambari.apache.org>, Sean Roberts <sroberts@hortonworks.com>
Subject: Re: decommission multiple = nodes issue

Hi Greg,

This is actually by design.
If you want to decommission all DataNodes regardless of their host maintenance mode, you need to change "= RequestInfo/level" from "CLUSTER" to "HOST_COMPONENT&qu= ot;.
When you set the "level" to "CLUSTER", bulk operat= ions (in this case decommission) would be skipped on the matching target re= sources in case the host(s) are in maintenance mode.
If you set to "HOST_COMPONENT", it would ignore any host-lev= el maintenance mode.
This is a really mysterious, undocumented part of Ambari, unfortunatel= y.

Yusaku

From: Greg Hill <greg.hill@RACKSPACE.COM>
Reply-To: "user@ambari.apache.org" <user@ambari.apache.org>
Date: Tuesday, March 3, 2015 9:32 A= M
To: Sean Roberts <sroberts@hortonworks.com>, "user@ambari.apache.org" <<= a href=3D"mailto:user@ambari.apache.org">user@ambari.apache.org>
Subject: Re: decommission multiple = nodes issue

I have verified that if maintenance mode is set on a host, then it is = ignored by the decommission process, but only if you try to decommission mu= ltiple hosts at the same time.  I'll open a bug.

Greg

From: Sean Roberts <sroberts@hortonworks.com>
Date: Monday, March 2, 2015 at 1:34= PM
To: Greg <greg.hill@rackspace.com>, "user@ambari.apache.org" <user@ambari.apache.org>
Subject: Re: decommission multiple = nodes issue

Greg - Same here on submitting JSON. Although they are JSON documents you h= ave to submit them as plain form. This is true across all of Ambari. I open= ed a bug for it a month back.


-- 
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <greg.hill@rackspace.com>
Date: March 2, 2015 at 19:32:34
To: Sean Roberts <sroberts@hortonworks.com>>, user@ambari.apache.org<user@ambari.apache.org>>=
Subject:  Re: decommission multiple nodes = issue

That causes a server error.  I=92ve yet to see any part of the AP= I that accepts JSON arrays like that as input; it=92s almost always, if not= always, a comma-separated string like I posted.  Many methods even re= turn double-encoded JSON values (i.e. =93key=94: =93[\=94value1\=94,\=94val= ue2\=94]").  It=92s kind of annoying and inconsistent, honestly, and not document= ed anywhere.  You just have to have your client code choke on it and t= hen go add another data[key] =3D json.loads(data[key]) in the client to acc= ount for it.

I am starting to think it=92s because I set the nodes into maintenance= mode first, as doing the decommission command manually from the client wor= ks fine when the nodes aren=92t in maintenance mode.  I=92ll keep digg= ing, I guess, but it is weird that the exact same command worked this time (the commandArgs are identical to the one th= at did nothing).

Greg

From: Sean Roberts <sroberts@= hortonworks.com>
Date: Monday, March 2, 2015 at 1:22= PM
To: Greg <greg.hill@rackspace.com>, "user@ambari.apache.org" <user@ambari.apache.org>
Subject: Re: decommission multiple = nodes issue

Racker Greg - I=92m not familiar with the decommissioning API, but if it= =92s consistent with the rest of Ambari, you=92ll need to change from this:=

"excluded_hosts": =93slave-1.local,slave-2.local"

To this:

"excluded_hosts" : [ "slave-1.local","sla= ve-2.local" ]



-- 
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <greg.hill@rackspace.com>
Reply: user@ambari.apache.org<user@ambari.apache.org>>
Date: March 2, 2015 at 19:08:13
To: user@ambari.apache.org<user@ambari.apache.org>>
Subject:  decommission multiple nodes issu= e

I have some code for decommissioning datanodes prior to removal.=  It seems to work fine with a single node, but with multiple nodes it= fails.  When passing multiple hosts, I am putting the names in a comm= a-separated string, as seems to be the custom with other Ambari API commands.  I attempted to send it as a JSON arr= ay, but the server complained about that.  Let me know if that is the = wrong format.  The decommission request completes successfully, it jus= t never writes the excludes file so no nodes are decommissioned.

This fails for mutiple nodes:

"RequestInfo": {
                "co= mmand": "DECOMMISSION",
                "co= ntext": "Decommission DataNode=94),
                "pa= rameters": {"slave_type": =93DATANODE", "excluded_= hosts": =93slave-1.local,slave-2.local"},
                "op= eration_level": {
=93level=94: =93CLUSTER=94,
=93cluster_name=94: cluster_name
},
            },
            "Requests/resourc= e_filters": [{
                "se= rvice_name": =93HDFS",
                "co= mponent_name": =93NAMENODE",
            }],

But this works for a single node:

"RequestInfo": {
                "co= mmand": "DECOMMISSION",
                "co= ntext": "Decommission DataNode=94),
                "pa= rameters": {"slave_type": =93DATANODE", "excluded_= hosts": =93slave-1.local"},
                "op= eration_level": {
=93level=94: =93HOST_COMPONENT=94,
=93cluster_name=94: cluster_name,
=93host_name=94: =93slave-1.local=94,
=93service_name=94: =93HDFS=94
},
            },
            "Requests/resourc= e_filters": [{
                "se= rvice_name": =93HDFS",
                "co= mponent_name": =93NAMENODE",
            }],

Looking on the actual node, it=92s obvious that the file isn=92t= being written by the command output:

(multiple hosts, notice there is no =91Writing File=92 line)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content'= : Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/cu= rrent/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override':= True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes= '] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tr= ies': 1, 'user': 'hdfs', 'try_sleep': 0}

(single host, it writes the exclude file)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content'= : Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Writing File['/etc/hadoop/conf/dfs.exclude'] because contents do= n't match
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/cu= rrent/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override':= True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes= '] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tr= ies': 1, 'user': 'hdfs', 'try_sleep': 0}

The only notable difference in the command.json is the commandPa= rams/excluded_hosts param, so it=92s not like the request is passing the in= formation along incorrectly.  I=92m going to play around with the form= at I use to pass it in and take some wild guesses like it=92s expecting double-encoded JSON as I=92ve seen that in o= ther places, but if someone knows the answer offhand and can help out, that= would be appreciated.  If it turns out to be a bug in Ambari, I=92ll = open a JIRA and rewrite our code to issue the decommission call independently for each host.

Greg
--_000_D11BCEC5556B4yusakuhortonworkscom_--