aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sirois <j...@conductant.com>
Subject Re: [PROPOSAL] Disallow instance removal in job update
Date Fri, 05 Feb 2016 23:32:48 GMT
On Fri, Feb 5, 2016 at 4:31 PM, Bill Farner <wfarner@apache.org> wrote:

> Or without any persistence at all.  The client could refuse to adjust the
> instance count on a job unless there's additional command line argument.
> The same arguments of responsibility could be said here of users of old
> clients or custom clients.
>

I guess that's true.  I concur.


> On Fri, Feb 5, 2016 at 3:17 PM, John Sirois <john@conductant.com> wrote:
>
> > On Fri, Feb 5, 2016 at 4:07 PM, Maxim Khutornenko <maxim@apache.org>
> > wrote:
> >
> > > We have had attempts to safeguard client updater command with a
> > > "dangerous change" warning before but it did not get good feedback.
> > > Besides, automated tools/scripts just ignored it.
> > >
> > > An alternative could be what George suggest on the scaling API thread
> > > mentioned earlier: automatically bump up instance count to the job
> > > active task count. I'd say this could be an implementation to the
> > > proposal above rather than a safeguard as it accomplishes the exact
> > > same goal.
> > >
> > > Bill, do you have any ideas of what that safeguard could be?
> > >
> >
> > I'd recommend that an API call that reduced instance count require an
> > `confirm_instance_reduction =true` parameter - this could be plumbed back
> > to a flag in the official Aurora client.
> > That said, since Aurora immediately forgets jobs and splits things into
> > tasks, I'm not sure this is even sanely possible today.
> >
> > Assuming it is possible, any human that turns that flag on by default
> with
> > a shell alias or an rc file can take responsibility for their own
> problem.
> > If a tool passes the boolean, again - that's the tool's problem.
> Hopefully
> > its a carefully developed and vetted auto-scaling tool.
> >
> >
> > > On Fri, Feb 5, 2016 at 2:56 PM, Bill Farner <wfarner@apache.org>
> wrote:
> > > >>
> > > >> the outdated instance count problem will only get worse as automated
> > > >> scaling tools will quickly render existing .aurora config value
> > obsolete
> > > >
> > > >
> > > > This is not a compelling reason to remove functionality.  Sounds
> like a
> > > > safeguard is needed instead.
> > > >
> > > > On Fri, Feb 5, 2016 at 2:43 PM, Maxim Khutornenko <maxim@apache.org>
> > > wrote:
> > > >
> > > >> This is mostly a survey rather than a proposal. How would people
> think
> > > >> about limiting updater to only adding/updating instances and let
> > > >> killTasks take care of instance removals?
> > > >>
> > > >> We have all heard stories (or happen to create some ourselves) when
> an
> > > >> outdated instance count value in .aurora config caused unexpected
> > > >> instance removals. Granted, there are plenty of other values in the
> > > >> config that can cause service-wide outage but instance count seems
> to
> > > >> be the worst in that sense.
> > > >>
> > > >> After the recent refactoring of addInstances and killTasks to act
as
> > > >> scaleOut/scaleIn APIs [1], the outdated instance count problem will
> > > >> only get worse as automated scaling tools will quickly render
> existing
> > > >> .aurora config value obsolete. With that in mind, should we block
> > > >> instance removal in the updater and let an explicit killTasks call
> be
> > > >> the only acceptable action to reduce instance count? Is there any
> > > >> value (aside from arguable convenience factor) in having
> > > >> startJobUpdate ever killing instances?
> > > >>
> > > >> Thanks,
> > > >> Maxim
> > > >>
> > > >> [1] - http://markmail.org/message/2smaej5n5e54li3g
> > > >>
> > >
> >
> >
> >
> > --
> > John Sirois
> > 303-512-3301
> >
>



-- 
John Sirois
303-512-3301

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message