Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4DE1B200B9D for ; Thu, 8 Sep 2016 00:05:24 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4C638160AC1; Wed, 7 Sep 2016 22:05:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 92B05160AD2 for ; Thu, 8 Sep 2016 00:05:23 +0200 (CEST) Received: (qmail 12613 invoked by uid 500); 7 Sep 2016 22:05:22 -0000 Mailing-List: contact issues-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list issues@ambari.apache.org Received: (qmail 12590 invoked by uid 99); 7 Sep 2016 22:05:22 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Sep 2016 22:05:22 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 751232C1B77 for ; Wed, 7 Sep 2016 22:05:22 +0000 (UTC) Date: Wed, 7 Sep 2016 22:05:22 +0000 (UTC) From: "Hudson (JIRA)" To: issues@ambari.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AMBARI-18191) "Restart all required" services operation failed at Metrics Collector since HDFS was not yet up MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 07 Sep 2016 22:05:24 -0000 [ https://issues.apache.org/jira/browse/AMBARI-18191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471931#comment-15471931 ] Hudson commented on AMBARI-18191: --------------------------------- FAILURE: Integrated in Jenkins build Ambari-branch-2.5 #4 (See [https://builds.apache.org/job/Ambari-branch-2.5/4/]) AMBARI-18191. Restart all required services operation failed at Metrics (swagle: [http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=6ec16941565debd031eefffad8c8160a7b4d7e59]) * (edit) ambari-server/src/main/java/org/apache/ambari/server/metadata/RoleCommandOrder.java * (edit) ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleCommandOrderTest.java > "Restart all required" services operation failed at Metrics Collector since HDFS was not yet up > ----------------------------------------------------------------------------------------------- > > Key: AMBARI-18191 > URL: https://issues.apache.org/jira/browse/AMBARI-18191 > Project: Ambari > Issue Type: Bug > Components: ambari-metrics > Affects Versions: 2.4.0 > Reporter: Sunitha > Assignee: Siddharth Wagle > Priority: Blocker > Fix For: trunk > > Attachments: AMBARI-18191.patch > > > ambari-server --hash > 4017036da951a10f519a578de934308cf866ba50 > *Steps* > # Deploy HDP-2.3.6 cluster with Ambari 2.2.2.0 (AMS is configured in distributed mode) > # Upgrade Ambari to 2.4.0.0 and let it complete > # Open Ambari web UI and hit "Restart all required" under Actions menu > *Result* > The operation fails while trying to restart Metrics Collector as it tried to make a WebHDFS call while HDFS was not started: > {code} > Traceback (most recent call last): > File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 148, in > AmsCollector().execute() > File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute > method(env) > File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 725, in restart > self.start(env) > File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 46, in start > self.configure(env, action = 'start') # for security > File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 41, in configure > hbase('master', action) > File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk > return fn(*args, **kwargs) > File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/hbase.py", line 213, in hbase > dfs_type=params.dfs_type > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ > self.env.run() > File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run > self.run_action(resource, action) > File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action > provider_action() > File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 459, in action_create_on_execute > self.action_delayed("create") > File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 456, in action_delayed > self.get_hdfs_resource_executor().action_delayed(action_name, self) > File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 256, in action_delayed > self._set_mode(self.target_status) > File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 363, in _set_mode > self.util.run_command(self.main_resource.resource.target, 'SETPERMISSION', method='PUT', permission=self.mode, assertable_result=False) > File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 179, in run_command > _, out, err = get_user_call_output(cmd, user=self.run_user, logoutput=self.logoutput, quiet=False) > File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output > raise Fail(err_msg) > resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT --negotiate -u : 'http://vsharma-eu-mt-5.openstacklocal:50070/webhdfs/v1/user/ams/hbase?op=SETPERMISSION&user.name=hdfs&permission=775' 1>/tmp/tmp8twcZt 2>/tmp/tmpLPih9a' returned 7. curl: (7) couldn't connect to host > 401 > {code} > Afterwards, restarted HDFS individually first and then hit "Restart all Required" - the operation was successful > Looks like the issue is because the order of restart is incorrect across the hosts, hence the dependent services don't come up upfront -- This message was sent by Atlassian JIRA (v6.3.4#6332)