Return-Path: X-Original-To: apmail-aurora-dev-archive@minotaur.apache.org Delivered-To: apmail-aurora-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6E63F18194 for ; Fri, 5 Feb 2016 23:07:25 +0000 (UTC) Received: (qmail 90797 invoked by uid 500); 5 Feb 2016 23:07:15 -0000 Delivered-To: apmail-aurora-dev-archive@aurora.apache.org Received: (qmail 90746 invoked by uid 500); 5 Feb 2016 23:07:15 -0000 Mailing-List: contact dev-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list dev@aurora.apache.org Received: (qmail 90734 invoked by uid 99); 5 Feb 2016 23:07:15 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Feb 2016 23:07:15 +0000 Received: from mail-io0-f178.google.com (mail-io0-f178.google.com [209.85.223.178]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 21A2E1A0379 for ; Fri, 5 Feb 2016 23:07:14 +0000 (UTC) Received: by mail-io0-f178.google.com with SMTP id d63so144828623ioj.2 for ; Fri, 05 Feb 2016 15:07:14 -0800 (PST) X-Gm-Message-State: AG10YOQ0ml++J6flQ5yYhQ4PfuQCj0Xy/Ji/d34Jr4XE+5oTWnYr574PgmLbHi54EnJynWUKTzYsWQhwya/JRA== MIME-Version: 1.0 X-Received: by 10.107.15.198 with SMTP id 67mr18553844iop.28.1454713634147; Fri, 05 Feb 2016 15:07:14 -0800 (PST) Received: by 10.107.169.152 with HTTP; Fri, 5 Feb 2016 15:07:14 -0800 (PST) In-Reply-To: References: Date: Fri, 5 Feb 2016 15:07:14 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PROPOSAL] Disallow instance removal in job update From: Maxim Khutornenko To: dev@aurora.apache.org Content-Type: text/plain; charset=UTF-8 We have had attempts to safeguard client updater command with a "dangerous change" warning before but it did not get good feedback. Besides, automated tools/scripts just ignored it. An alternative could be what George suggest on the scaling API thread mentioned earlier: automatically bump up instance count to the job active task count. I'd say this could be an implementation to the proposal above rather than a safeguard as it accomplishes the exact same goal. Bill, do you have any ideas of what that safeguard could be? On Fri, Feb 5, 2016 at 2:56 PM, Bill Farner wrote: >> >> the outdated instance count problem will only get worse as automated >> scaling tools will quickly render existing .aurora config value obsolete > > > This is not a compelling reason to remove functionality. Sounds like a > safeguard is needed instead. > > On Fri, Feb 5, 2016 at 2:43 PM, Maxim Khutornenko wrote: > >> This is mostly a survey rather than a proposal. How would people think >> about limiting updater to only adding/updating instances and let >> killTasks take care of instance removals? >> >> We have all heard stories (or happen to create some ourselves) when an >> outdated instance count value in .aurora config caused unexpected >> instance removals. Granted, there are plenty of other values in the >> config that can cause service-wide outage but instance count seems to >> be the worst in that sense. >> >> After the recent refactoring of addInstances and killTasks to act as >> scaleOut/scaleIn APIs [1], the outdated instance count problem will >> only get worse as automated scaling tools will quickly render existing >> .aurora config value obsolete. With that in mind, should we block >> instance removal in the updater and let an explicit killTasks call be >> the only acceptable action to reduce instance count? Is there any >> value (aside from arguable convenience factor) in having >> startJobUpdate ever killing instances? >> >> Thanks, >> Maxim >> >> [1] - http://markmail.org/message/2smaej5n5e54li3g >>