Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B98E0200C53 for ; Tue, 11 Apr 2017 20:36:45 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B83F6160B9E; Tue, 11 Apr 2017 18:36:45 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0BEA0160B7D for ; Tue, 11 Apr 2017 20:36:44 +0200 (CEST) Received: (qmail 39885 invoked by uid 500); 11 Apr 2017 18:36:44 -0000 Mailing-List: contact issues-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list issues@ambari.apache.org Received: (qmail 39875 invoked by uid 99); 11 Apr 2017 18:36:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Apr 2017 18:36:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B83231806F7 for ; Tue, 11 Apr 2017 18:36:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id N5yFJwry2OKr for ; Tue, 11 Apr 2017 18:36:42 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 85D3560DA1 for ; Tue, 11 Apr 2017 18:36:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id D5AFFE0BDD for ; Tue, 11 Apr 2017 18:36:41 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8A7F524067 for ; Tue, 11 Apr 2017 18:36:41 +0000 (UTC) Date: Tue, 11 Apr 2017 18:36:41 +0000 (UTC) From: "Nate Cole (JIRA)" To: issues@ambari.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (AMBARI-20736) Allow Potentially Long Running Restart Commands To Have Their Own Timeout MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 11 Apr 2017 18:36:45 -0000 Nate Cole created AMBARI-20736: ---------------------------------- Summary: Allow Potentially Long Running Restart Commands To Have Their Own Timeout Key: AMBARI-20736 URL: https://issues.apache.org/jira/browse/AMBARI-20736 Project: Ambari Issue Type: Bug Components: ambari-server Reporter: Nate Cole Assignee: Nate Cole Priority: Critical Fix For: 2.5.1 During an upgrade of a cluster, some commands are expected to take a very long time depending on what the size of the cluster is and how much data is stored. For example, a NameNode restart with SafeMode exit may take in excess of 30 minutes. On some clusters, this could take less than 1 minute. Currently today, the only way to adjust these properties is to do so across the board for all commands by editing {{ambari.properties}} and setting {{agent.task.timeout}}. This solution doesn't work very well since the majority of restarts during an upgrade are not on a master component. There needs to be a way to instruct Ambari that a restart should be allowed to run for a relatively long period of time. - Both Java and Python need to be considered here. We don't want Python to give up and return a {{FAILED}} state and we don't want Ambari server to set the task to {{TIMEDOUT}}. - This can be useful in both normal restarts and upgrade scenarios. h3. Upgrade Only If considering this functionality in the context of an upgrade only, then it is conceivable that this logic can be placed inside of the upgrade XML packs: {code} {code} - This would allow future mpacks to be able to control the restart of components. Perhaps this can even be slightly abstracted out: {code} upgrade.parameter.slave.restart.short = 300 upgrade.parameter.slave.restart.long = 900 upgrade.parameter.master.restart.short = 1500 upgrade.parameter.master.restart.long = 1800 {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)