aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sirois <j...@conductant.com>
Subject Re: [PROPOSAL] Disallow instance removal in job update
Date Fri, 05 Feb 2016 23:17:11 GMT
On Fri, Feb 5, 2016 at 4:07 PM, Maxim Khutornenko <maxim@apache.org> wrote:

> We have had attempts to safeguard client updater command with a
> "dangerous change" warning before but it did not get good feedback.
> Besides, automated tools/scripts just ignored it.
>
> An alternative could be what George suggest on the scaling API thread
> mentioned earlier: automatically bump up instance count to the job
> active task count. I'd say this could be an implementation to the
> proposal above rather than a safeguard as it accomplishes the exact
> same goal.
>
> Bill, do you have any ideas of what that safeguard could be?
>

I'd recommend that an API call that reduced instance count require an
`confirm_instance_reduction =true` parameter - this could be plumbed back
to a flag in the official Aurora client.
That said, since Aurora immediately forgets jobs and splits things into
tasks, I'm not sure this is even sanely possible today.

Assuming it is possible, any human that turns that flag on by default with
a shell alias or an rc file can take responsibility for their own problem.
If a tool passes the boolean, again - that's the tool's problem.  Hopefully
its a carefully developed and vetted auto-scaling tool.


> On Fri, Feb 5, 2016 at 2:56 PM, Bill Farner <wfarner@apache.org> wrote:
> >>
> >> the outdated instance count problem will only get worse as automated
> >> scaling tools will quickly render existing .aurora config value obsolete
> >
> >
> > This is not a compelling reason to remove functionality.  Sounds like a
> > safeguard is needed instead.
> >
> > On Fri, Feb 5, 2016 at 2:43 PM, Maxim Khutornenko <maxim@apache.org>
> wrote:
> >
> >> This is mostly a survey rather than a proposal. How would people think
> >> about limiting updater to only adding/updating instances and let
> >> killTasks take care of instance removals?
> >>
> >> We have all heard stories (or happen to create some ourselves) when an
> >> outdated instance count value in .aurora config caused unexpected
> >> instance removals. Granted, there are plenty of other values in the
> >> config that can cause service-wide outage but instance count seems to
> >> be the worst in that sense.
> >>
> >> After the recent refactoring of addInstances and killTasks to act as
> >> scaleOut/scaleIn APIs [1], the outdated instance count problem will
> >> only get worse as automated scaling tools will quickly render existing
> >> .aurora config value obsolete. With that in mind, should we block
> >> instance removal in the updater and let an explicit killTasks call be
> >> the only acceptable action to reduce instance count? Is there any
> >> value (aside from arguable convenience factor) in having
> >> startJobUpdate ever killing instances?
> >>
> >> Thanks,
> >> Maxim
> >>
> >> [1] - http://markmail.org/message/2smaej5n5e54li3g
> >>
>



-- 
John Sirois
303-512-3301

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message