Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6A34E200C55 for ; Thu, 13 Apr 2017 16:28:48 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 68D21160B98; Thu, 13 Apr 2017 14:28:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8A5B8160B8B for ; Thu, 13 Apr 2017 16:28:47 +0200 (CEST) Received: (qmail 66385 invoked by uid 500); 13 Apr 2017 14:28:46 -0000 Mailing-List: contact issues-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list issues@ambari.apache.org Received: (qmail 66376 invoked by uid 99); 13 Apr 2017 14:28:46 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Apr 2017 14:28:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 5589F1A048D for ; Thu, 13 Apr 2017 14:28:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.201 X-Spam-Level: X-Spam-Status: No, score=-99.201 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id BmTWOQigisWh for ; Thu, 13 Apr 2017 14:28:44 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 707205FE48 for ; Thu, 13 Apr 2017 14:28:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B2C09E0C15 for ; Thu, 13 Apr 2017 14:28:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id C8F0324072 for ; Thu, 13 Apr 2017 14:28:41 +0000 (UTC) Date: Thu, 13 Apr 2017 14:28:41 +0000 (UTC) From: "Hudson (JIRA)" To: issues@ambari.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AMBARI-20736) Allow Potentially Long Running Restart Commands To Have Their Own Timeout MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 13 Apr 2017 14:28:48 -0000 [ https://issues.apache.org/jira/browse/AMBARI-20736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967646#comment-15967646 ] Hudson commented on AMBARI-20736: --------------------------------- FAILURE: Integrated in Jenkins build Ambari-trunk-Commit #7289 (See [https://builds.apache.org/job/Ambari-trunk-Commit/7289/]) AMBARI-20736. Allow Potentially Long Running Restart Commands To Have (ncole: [http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=ac75f1daccc2c1117e175f95cd9642e85b4fd366]) * (edit) ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UpgradeResourceProviderTest.java * (edit) ambari-common/src/main/python/resource_management/libraries/functions/decorator.py * (edit) ambari-server/src/main/resources/stacks/HDP/2.4/upgrades/upgrade-2.5.xml * (edit) ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/StageWrapper.java * (edit) ambari-server/src/main/resources/stacks/HDP/2.5/upgrades/upgrade-2.6.xml * (edit) ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.5.xml * (edit) ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/Grouping.java * (edit) ambari-server/src/main/resources/upgrade-pack.xsd * (edit) ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py * (edit) ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariCustomCommandExecutionHelper.java * (edit) ambari-server/src/main/resources/stacks/HDP/2.4/upgrades/upgrade-2.4.xml * (edit) ambari-server/src/main/resources/stacks/HDP/2.4/upgrades/upgrade-2.6.xml * (edit) ambari-server/src/test/resources/stacks/HDP/2.1.1/upgrades/upgrade_test.xml * (edit) ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/TaskWrapperBuilder.java * (edit) ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.4.xml * (edit) ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/Task.java * (edit) ambari-server/src/main/resources/stacks/HDP/2.6/upgrades/upgrade-2.6.xml * (edit) ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/TaskWrapper.java * (edit) ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/params_linux.py * (edit) ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java * (edit) ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.6.xml * (edit) ambari-server/src/main/resources/stacks/HDP/2.5/upgrades/upgrade-2.5.xml * (edit) ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.3.xml > Allow Potentially Long Running Restart Commands To Have Their Own Timeout > ------------------------------------------------------------------------- > > Key: AMBARI-20736 > URL: https://issues.apache.org/jira/browse/AMBARI-20736 > Project: Ambari > Issue Type: Bug > Components: ambari-server > Reporter: Nate Cole > Assignee: Nate Cole > Priority: Critical > Fix For: 2.5.1 > > Attachments: AMBARI-20736.patch > > > During an upgrade of a cluster, some commands are expected to take a very long time depending on what the size of the cluster is and how much data is stored. For example, a NameNode restart with SafeMode exit may take in excess of 30 minutes. On some clusters, this could take less than 1 minute. > Currently today, the only way to adjust these properties is to do so across the board for all commands by editing {{ambari.properties}} and setting {{agent.task.timeout}}. This solution doesn't work very well since the majority of restarts during an upgrade are not on a master component. > There needs to be a way to instruct Ambari that a restart should be allowed to run for a relatively long period of time. > - Both Java and Python need to be considered here. We don't want Python to give up and return a {{FAILED}} state and we don't want Ambari server to set the task to {{TIMEDOUT}}. > - This can be useful in both normal restarts and upgrade scenarios. > h3. Upgrade Only > If considering this functionality in the context of an upgrade only, then it is conceivable that this logic can be placed inside of the upgrade XML packs: > {code} > > > > > > {code} > - This would allow future mpacks to be able to control the restart of components. Perhaps this can even be slightly abstracted out: > {code} > > > > > > upgrade.parameter.slave.restart.short = 300 > upgrade.parameter.slave.restart.long = 900 > upgrade.parameter.master.restart.short = 1500 > upgrade.parameter.master.restart.long = 1800 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)