Return-Path: X-Original-To: apmail-aurora-issues-archive@minotaur.apache.org Delivered-To: apmail-aurora-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B5A5F175A8 for ; Tue, 7 Apr 2015 00:31:12 +0000 (UTC) Received: (qmail 92129 invoked by uid 500); 7 Apr 2015 00:31:12 -0000 Delivered-To: apmail-aurora-issues-archive@aurora.apache.org Received: (qmail 92083 invoked by uid 500); 7 Apr 2015 00:31:12 -0000 Mailing-List: contact issues-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list issues@aurora.apache.org Received: (qmail 92073 invoked by uid 99); 7 Apr 2015 00:31:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Apr 2015 00:31:12 +0000 Date: Tue, 7 Apr 2015 00:31:12 +0000 (UTC) From: "Bill Farner (JIRA)" To: issues@aurora.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (AURORA-1240) Ignore JobUpdateSettings.maxWaitToInstanceRunningMs in the scheduler MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AURORA-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Farner updated AURORA-1240: -------------------------------- Component/s: (was: Client) Description: The UpdateConfig {{restart_theshold}} \[1\] setting does not appear to deliver much user value as it's highly sensitive to scheduling performance and may result in aborted/rolled back job updates when set too low. Some background: This timeout controls task transition from {{PENDING}} to {{RUNNING}} during the job update. In the event of cluster capacity shortage, assigning a task to a host may take considerably longer thus expiring the timeout and depending on the failure settings causing an unnecessary job update abort or rollback. It was meant to give users some protection against unsatisfiable resource/constraint requirements. In reality though, it proved to be rather an annoyance to users when an update is interrupted due to unexpected delay in task assignment. Consider deprecating and subsequently removing this setting. This ticket tracks a first step to ignore this value in the scheduler updater. See linked tickets for follow-up work. \[1\] - https://github.com/apache/aurora/blob/master/docs/configuration-reference.md#updateconfig-objects was: The UpdateConfig {{restart_theshold}} \[1\] setting does not appear to deliver much user value as it's highly sensitive to scheduling performance and may result in aborted/rolled back job updates when set too low. Some background: This timeout controls task transition from {{PENDING}} to {{RUNNING}} during the job update. In the event of cluster capacity shortage, assigning a task to a host may take considerably longer thus expiring the timeout and depending on the failure settings causing an unnecessary job update abort or rollback. It was meant to give users some protection against unsatisfiable resource/constraint requirements. In reality though, it proved to be rather an annoyance to users when an update is interrupted due to unexpected delay in task assignment. Consider deprecating and subsequently removing this setting. \[1\] - https://github.com/apache/aurora/blob/master/docs/configuration-reference.md#updateconfig-objects Summary: Ignore JobUpdateSettings.maxWaitToInstanceRunningMs in the scheduler (was: Deprecate UpdateConfig "restart_threshold" setting) > Ignore JobUpdateSettings.maxWaitToInstanceRunningMs in the scheduler > -------------------------------------------------------------------- > > Key: AURORA-1240 > URL: https://issues.apache.org/jira/browse/AURORA-1240 > Project: Aurora > Issue Type: Task > Components: Scheduler > Reporter: Maxim Khutornenko > Assignee: Bill Farner > > The UpdateConfig {{restart_theshold}} \[1\] setting does not appear to deliver much user value as it's highly sensitive to scheduling performance and may result in aborted/rolled back job updates when set too low. > Some background: This timeout controls task transition from {{PENDING}} to {{RUNNING}} during the job update. In the event of cluster capacity shortage, assigning a task to a host may take considerably longer thus expiring the timeout and depending on the failure settings causing an unnecessary job update abort or rollback. It was meant to give users some protection against unsatisfiable resource/constraint requirements. In reality though, it proved to be rather an annoyance to users when an update is interrupted due to unexpected delay in task assignment. > Consider deprecating and subsequently removing this setting. > This ticket tracks a first step to ignore this value in the scheduler updater. See linked tickets for follow-up work. > \[1\] - https://github.com/apache/aurora/blob/master/docs/configuration-reference.md#updateconfig-objects -- This message was sent by Atlassian JIRA (v6.3.4#6332)