aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Niemitz <sniem...@apache.org>
Subject Re: [PROPOSAL] Job instance scaling APIs
Date Thu, 14 Jan 2016 20:35:31 GMT
As some background, we handle scale up / down purely from the client side,
using the update API for both directions.  I'd be concerned that any
scaling API to be powerful enough to fit all (most) use cases would just
end up looking like the update API.

For example, when scaling down we don't just kill the last N instances, we
actually look at the least loaded hosts (globally) and kill tasks from
those.


On Thu, Jan 14, 2016 at 3:28 PM, Maxim Khutornenko <maxim@apache.org> wrote:

> "How is scaling down different from killing instances?"
>
> I found 'killTasks' syntax too different and way much more powerful to
> be used for scaling in. The TaskQuery allows killing instances across
> jobs/roles, whereas 'scaleIn' is narrowed down to just a single job.
> Additional benefit: it can be ACLed independently by allowing external
> process kill tasks only within a given job. We may also add rate
> limiting or backoff to it later.
>
> As for Joshua's question, I feel it should be an operator's
> responsibility to diff a job with its aurora config before applying an
> update. That said, if there is enough demand we can definitely
> consider adding something similar to what George suggested or
> resurrecting a 'large change' warning message we used to have in
> client updater.
>
> On Thu, Jan 14, 2016 at 12:06 PM, George Sirois <george@tellapart.com>
> wrote:
> > As a point of reference, we solved this problem by adding a binding
> helper
> > that queries the scheduler for the current number of instances and uses
> > that number instead of a hardcoded config:
> >
> >    instances='{{scaling_instances[60]}}'
> >
> > In this example, instances will be set to the currently running number
> > (unless there are none, in which case 60 instances will be created).
> >
> > On Thu, Jan 14, 2016 at 2:44 PM, Joshua Cohen <jcohen@apache.org> wrote:
> >
> >> What happens if a job has been scaled out, but the underlying config is
> not
> >> updated to take that scaling into account? Would the next update on that
> >> job revert the number of instances (presumably, because what else could
> we
> >> do)? Is there anything we can do, tooling-wise, to improve upon this?
> >>
> >> On Thu, Jan 14, 2016 at 1:40 PM, Maxim Khutornenko <maxim@apache.org>
> >> wrote:
> >>
> >> > Our rolling update APIs can be quite inconvenient to work with when it
> >> > comes to instance scaling [1]. It's especially frustrating when
> >> > adding/removing instances has to be done in an automated fashion
> (e.g.:
> >> by
> >> > an external autoscaling process) as it requires holding on to the
> >> original
> >> > aurora config at all times.
> >> >
> >> > I propose we add simple instance scaling APIs to address the above.
> Since
> >> > Aurora job may have instances at different configs at any moment, I
> >> propose
> >> > we accept an InstanceKey as a reference point when scaling out. For
> >> > example:
> >> >
> >> >     /** Scales out a given job by adding more instances with the task
> >> > config of the templateKey. */
> >> >     Response scaleOut(1: InstanceKey templateKey, 2: i32
> incrementCount)
> >> >
> >> >     /** Scales in a given job by removing existing instances. */
> >> >     Response scaleIn(1: JobKey job, 2: i32 decrementCount)
> >> >
> >> > A correspondent client command could then look like:
> >> >
> >> >     aurora job scale-out devcluster/vagrant/test/hello/1 10
> >> >
> >> > For the above command, a scheduler would take task config of instance
> 1
> >> of
> >> > the 'hello' job and replicate it 10 more times thus adding 10
> additional
> >> > instances to the job.
> >> >
> >> > There are, of course, some details to work out like making sure no
> active
> >> > update is in flight, scale out does not violate quota and etc. I
> intend
> >> to
> >> > address those during the implementation as things progress.
> >> >
> >> > Does the above make sense? Any concerns/suggestions?
> >> >
> >> > Thanks,
> >> > Maxim
> >> >
> >> > [1] - https://issues.apache.org/jira/browse/AURORA-1258
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message