Mailing-List: contact dev-help@aurora.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@aurora.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAFWq12Xm_XfxvxgJdauPuvACR+t5CQNBqVbMh=hEYaLJov4dmg@mail.gmail.com>
References: 
 <CAOTkfX4yg=B-NK4mMBsB9t=CGHCeGrmQHi_FPve3XHsuOnVNpg@mail.gmail.com>
	<CAMnduq-jKovRt3EBj=DWFRgg1e4D=6cOhAiBCvC8Xw9ekj8zpg@mail.gmail.com>
	<CADLzSrFTBnL3d0RyfjkVExdiErwXsRjdRYCVK=EHn3HxpnhGhQ@mail.gmail.com>
	<CAOTkfX6XUdWxZBjQXn2YgBO7pO+btA9PMh20Hkz7pPQ1WzkPZQ@mail.gmail.com>
	<CAJ4qn4nGM6TGPqNL=EEBgNpc7fcBfyae_dHw2QVQ7hzyWkD6Gg@mail.gmail.com>
	<CAOTkfX6mSiKxnVBftdWdi3DDB40=UAW15aSFUn3ua_GgqXC2qg@mail.gmail.com>
	<1452848997962.73259@blue-yonder.com>
	<CAOTkfX4qCB7UVM0j-uTH84J2y8C4Zn_UR8d4Y96fahdzWABm3w@mail.gmail.com>
	<1453050051951.10303@blue-yonder.com>
	<CAOTkfX7OEXCSNYzTdoQPkeyJ3YP2Yc+0msYNn6u6ouBRQ5A5Jw@mail.gmail.com>
	<CAFWq12WEOFvdKOr8nsx1X3LkxjFSS7tHtVm9xYEHuXHnkq=gpw@mail.gmail.com>
	<CALtK5B=hQBt=sjz-PVP9D6BecajxfDcTPqkEFzGNN8RxHXY9+A@mail.gmail.com>
	<CAOTkfX6ko2THQXEigemqfsoEJ41T1v+2v0111dBm8vt=E7_=MQ@mail.gmail.com>
	<CAFWq12VNiSMLR-vVV46H6Ui6XfuHRVhx9UtkxEGh7+st3kFKXw@mail.gmail.com>
	<CAOTkfX4ReTQe6KVBFRDGvM31KBUqoYSoMvVjvL4PO44MzZyWWg@mail.gmail.com>
	<CAFWq12Xm_XfxvxgJdauPuvACR+t5CQNBqVbMh=hEYaLJov4dmg@mail.gmail.com>
Date: Tue, 19 Jan 2016 17:06:53 -0800
Message-ID: 
 <CAOTkfX45h5F9=DWdxvxyoA-epfmqTparp4DJJmdBB+vK74sJ6w@mail.gmail.com>
Subject: Re: [PROPOSAL] Job instance scaling APIs
From: Maxim Khutornenko <maxim@apache.org>
To: dev@aurora.apache.org
Content-Type: text/plain; charset=UTF-8

Hmm, I have no idea where I got that old version from :) Sounds good to me.

On Tue, Jan 19, 2016 at 5:02 PM, Bill Farner <wfarner@apache.org> wrote:
> Nit: your examples above include SessionKey, which no longer exists.
>
> Otherwise LGTM.
>
> On Tue, Jan 19, 2016 at 4:58 PM, Maxim Khutornenko <maxim@apache.org> wrote:
>
>> I think the behavior is spot on.
>>
>> So the plan is to move the current:
>>
>> Response addInstances(
>>     1: AddInstancesConfig config,
>>     2: Lock lock,
>>     3: SessionKey session)
>>
>> to:
>>
>> Response addInstances(
>>     1: AddInstancesConfig config,
>>     2: Lock lock,
>>     3: SessionKey session,
>>     4: InstanceKey instanceKey,
>>     5: int32 instanceCount)
>>
>> In the current release we will fork the behavior depending on whether
>> the InstanceKey is present. In the next release, we drop 1,2 and 3 and
>> end up with only 4 and 5.
>>
>> Any concerns?
>>
>> On Tue, Jan 19, 2016 at 3:57 PM, Bill Farner <wfarner@apache.org> wrote:
>> > Unless the behavior differs significantly, repurposing sounds like the
>> > right thing to do.  We don't *need* a separate RPC to accomplish this, we
>> > can just add arguments or struct fields to accomplish dual use.
>> >
>> > On Tue, Jan 19, 2016 at 3:53 PM, Maxim Khutornenko <maxim@apache.org>
>> wrote:
>> >
>> >> That thought crossed my mind but due to addInstances() being reserved
>> >> for an existing RPC I filed AURORA-1581 to handle its graceful
>> >> deprecation first.
>> >>
>> >> Bill, are you suggesting re-purposing the existing 'addInstances()'
>> >> into a 'scaleOut()' equivalent? I am mostly +1 on the change but if we
>> >> do it within the same release we will not be following our deprecation
>> >> guidelines. Perhaps we can name it something like 'addTaskInstances()'
>> >> instead and let addInstances() go away naturally?
>> >>
>> >> On Tue, Jan 19, 2016 at 3:06 PM, Tony Dong <tdong@twitter.com.invalid>
>> >> wrote:
>> >> > +1 to addInstances
>> >> >
>> >> > On Tue, Jan 19, 2016 at 3:00 PM, Bill Farner <wfarner@apache.org>
>> wrote:
>> >> >
>> >> >> At risk of devolving the discussion, is it worth calling the method
>> >> >> addInstances as opposed to scaleOut?  I find the former more
>> >> descriptive.
>> >> >>
>> >> >> On Tue, Jan 19, 2016 at 11:12 AM, Maxim Khutornenko <
>> maxim@apache.org>
>> >> >> wrote:
>> >> >>
>> >> >> > "Of course, the scaler could manually health check that all
>> instances
>> >> >> > have come up and are being used as expected, but I guess that is
>> what
>> >> >> > Aurora is for."
>> >> >> >
>> >> >> > I'd argue the updater "watch_secs" health checking isn't enough to
>> >> >> > ensure graceful rollout as instances may start flapping right after
>> >> >> > the updater signs off. Instances outside of update window may also
>> >> >> > flap (e.g. due to backend pressure) and updater will not be able to
>> >> >> > catch that. That's why a robust autoscaler has to rely on external
>> >> >> > monitoring tools and overall job health instead.
>> >> >> >
>> >> >> > A very basic approach, as you mentioned above, could be querying
>> job
>> >> >> > status repeatedly and count the ratio of tasks in RUNNING vs active
>> >> >> > (ASSIGNED, PENDING, THROTTLED, STARTING, etc.) states in order to
>> make
>> >> >> > a scaleOut decision. The more reliable approach though would also
>> rely
>> >> >> > on external monitoring stats exposed by user processes. That would
>> be
>> >> >> > a much higher fidelity signal than a decision based on task status
>> >> >> > alone. Scheduler does not (and should not for scalability reasons)
>> >> >> > have visibility into those stats, so the autoscaler would be in a
>> much
>> >> >> > better position to make an executive decision there.
>> >> >> >
>> >> >> > On Sun, Jan 17, 2016 at 9:00 AM, Erb, Stephan
>> >> >> > <Stephan.Erb@blue-yonder.com> wrote:
>> >> >> > > I believe the operation is not that simple when you look at the
>> >> >> > end-to-end scenario.
>> >> >> > >
>> >> >> > > For example, the implementation of an auto-scaler  using the new
>> >> >> > scaleOut() API could look like:
>> >> >> > >
>> >> >> > > 1) check some KPI
>> >> >> > > 2) Infer an action based on this KPI such as scaleUp() or
>> >> scaleDown()
>> >> >> > > 3) wait until the effects of the adjusted instance count is
>> >> reflected
>> >> >> in
>> >> >> > the KPI. Go to  1 and repeat.
>> >> >> > >
>> >> >> > > The health checking capabilities of the existing updater (in
>> >> particular
>> >> >> > together with [1]) would be really helpful here. Still, the
>> simplified
>> >> >> > scaleOut() API would offer the great benefit that the auto-scaler
>> >> would
>> >> >> not
>> >> >> > need to know about the used aurora configuration.
>> >> >> > >
>> >> >> > > We even had an incident with a sub-optimal implementation of step
>> >> 3):
>> >> >> An
>> >> >> > overloaded package backend lead to slow service startups. The
>> service
>> >> >> > startup took longer than the grace-period of our auto-scaler. It
>> >> >> therefore
>> >> >> > decided to add more and more instances, because the KPI wasn't
>> >> improving
>> >> >> as
>> >> >> > expected. It had no way of knowing that these instances were not
>> even
>> >> >> > 'running'. The additionally added instances aggravated the overload
>> >> >> > situation of the package backend.  Of course, the scaler could
>> >> manually
>> >> >> > health check that all instances have come up and are being used as
>> >> >> > expected, but I guess that is what Aurora is for.
>> >> >> > >
>> >> >> > > [1]
>> >> >> >
>> >> >>
>> >>
>> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit?pref=2&pli=1#heading=h.n0kb37aiy8ua
>> >> >> > >
>> >> >> > > Best Regards,
>> >> >> > > Stephan
>> >> >> > > ________________________________________
>> >> >> > > From: Maxim Khutornenko <maxim@apache.org>
>> >> >> > > Sent: Friday, January 15, 2016 7:06 PM
>> >> >> > > To: dev@aurora.apache.org
>> >> >> > > Subject: Re: [PROPOSAL] Job instance scaling APIs
>> >> >> > >
>> >> >> > > I wasn't planning on using the rolling updater functionality
>> given
>> >> the
>> >> >> > > simplicity of the operation. I'd second Steve's earlier concerns
>> >> about
>> >> >> > > scaleOut() looking more like startJobUpdate() if we keep adding
>> >> >> > > features. If health watching, throttling (batch_size) or
>> rollback on
>> >> >> > > failure is required then I believe the startJobUpdate() should be
>> >> used
>> >> >> > > instead of scaleOut().
>> >> >> > >
>> >> >> > > On Fri, Jan 15, 2016 at 1:09 AM, Erb, Stephan
>> >> >> > > <Stephan.Erb@blue-yonder.com> wrote:
>> >> >> > >> I really like the proposal. The gain in simplicity on the
>> >> client-side
>> >> >> > by not having to provide an aurora config is quite significant.
>> >> >> > >>
>> >> >> > >> The implementation on the scheduler side is probably rather
>> >> straight
>> >> >> > forward as the update can be reused. That would also provide us
>> with
>> >> the
>> >> >> > update UI, which has shown to be quite useful when tracing
>> autoscaler
>> >> >> > events.
>> >> >> > >>
>> >> >> > >> Regards,
>> >> >> > >> Stephan
>> >> >> > >> ________________________________________
>> >> >> > >> From: Maxim Khutornenko <maxim@apache.org>
>> >> >> > >> Sent: Thursday, January 14, 2016 9:50 PM
>> >> >> > >> To: dev@aurora.apache.org
>> >> >> > >> Subject: Re: [PROPOSAL] Job instance scaling APIs
>> >> >> > >>
>> >> >> > >> "I'd be concerned that any
>> >> >> > >> scaling API to be powerful enough to fit all (most) use cases
>> would
>> >> >> just
>> >> >> > >> end up looking like the update API."
>> >> >> > >>
>> >> >> > >> There is a big difference between scaleOut and startJobUpdate
>> APIs
>> >> >> > >> that justifies the inclusion of the former. Namely, scaleOut may
>> >> only
>> >> >> > >> replicate the existing instances without changing/introducing
>> any
>> >> new
>> >> >> > >> scheduling requirements or performing instance
>> rollout/rollback. I
>> >> >> > >> don't see scaleOut ever becoming more powerful to threaten
>> >> >> > >> startJobUpdate. At the same time, the absence of aurora config
>> >> >> > >> requirement is a huge boost to autoscaling client
>> simplification.
>> >> >> > >>
>> >> >> > >> "For example, when scaling down we don't just kill the last N
>> >> >> > instances, we
>> >> >> > >> actually look at the least loaded hosts (globally) and kill
>> tasks
>> >> from
>> >> >> > >> those."
>> >> >> > >>
>> >> >> > >> I don't quite see why the same wouldn't be possible with a
>> scaleIn
>> >> >> > >> API. Isn't it always external process responsibility to pay due
>> >> >> > >> diligence before killing instances?
>> >> >> > >>
>> >> >> > >>
>> >> >> > >> On Thu, Jan 14, 2016 at 12:35 PM, Steve Niemitz <
>> >> sniemitz@apache.org>
>> >> >> > wrote:
>> >> >> > >>> As some background, we handle scale up / down purely from the
>> >> client
>> >> >> > side,
>> >> >> > >>> using the update API for both directions.  I'd be concerned
>> that
>> >> any
>> >> >> > >>> scaling API to be powerful enough to fit all (most) use cases
>> >> would
>> >> >> > just
>> >> >> > >>> end up looking like the update API.
>> >> >> > >>>
>> >> >> > >>> For example, when scaling down we don't just kill the last N
>> >> >> > instances, we
>> >> >> > >>> actually look at the least loaded hosts (globally) and kill
>> tasks
>> >> >> from
>> >> >> > >>> those.
>> >> >> > >>>
>> >> >> > >>>
>> >> >> > >>> On Thu, Jan 14, 2016 at 3:28 PM, Maxim Khutornenko <
>> >> maxim@apache.org
>> >> >> >
>> >> >> > wrote:
>> >> >> > >>>
>> >> >> > >>>> "How is scaling down different from killing instances?"
>> >> >> > >>>>
>> >> >> > >>>> I found 'killTasks' syntax too different and way much more
>> >> powerful
>> >> >> to
>> >> >> > >>>> be used for scaling in. The TaskQuery allows killing instances
>> >> >> across
>> >> >> > >>>> jobs/roles, whereas 'scaleIn' is narrowed down to just a
>> single
>> >> job.
>> >> >> > >>>> Additional benefit: it can be ACLed independently by allowing
>> >> >> external
>> >> >> > >>>> process kill tasks only within a given job. We may also add
>> rate
>> >> >> > >>>> limiting or backoff to it later.
>> >> >> > >>>>
>> >> >> > >>>> As for Joshua's question, I feel it should be an operator's
>> >> >> > >>>> responsibility to diff a job with its aurora config before
>> >> applying
>> >> >> an
>> >> >> > >>>> update. That said, if there is enough demand we can definitely
>> >> >> > >>>> consider adding something similar to what George suggested or
>> >> >> > >>>> resurrecting a 'large change' warning message we used to have
>> in
>> >> >> > >>>> client updater.
>> >> >> > >>>>
>> >> >> > >>>> On Thu, Jan 14, 2016 at 12:06 PM, George Sirois <
>> >> >> george@tellapart.com
>> >> >> > >
>> >> >> > >>>> wrote:
>> >> >> > >>>> > As a point of reference, we solved this problem by adding a
>> >> >> binding
>> >> >> > >>>> helper
>> >> >> > >>>> > that queries the scheduler for the current number of
>> instances
>> >> and
>> >> >> > uses
>> >> >> > >>>> > that number instead of a hardcoded config:
>> >> >> > >>>> >
>> >> >> > >>>> >    instances='{{scaling_instances[60]}}'
>> >> >> > >>>> >
>> >> >> > >>>> > In this example, instances will be set to the currently
>> running
>> >> >> > number
>> >> >> > >>>> > (unless there are none, in which case 60 instances will be
>> >> >> created).
>> >> >> > >>>> >
>> >> >> > >>>> > On Thu, Jan 14, 2016 at 2:44 PM, Joshua Cohen <
>> >> jcohen@apache.org>
>> >> >> > wrote:
>> >> >> > >>>> >
>> >> >> > >>>> >> What happens if a job has been scaled out, but the
>> underlying
>> >> >> > config is
>> >> >> > >>>> not
>> >> >> > >>>> >> updated to take that scaling into account? Would the next
>> >> update
>> >> >> > on that
>> >> >> > >>>> >> job revert the number of instances (presumably, because
>> what
>> >> else
>> >> >> > could
>> >> >> > >>>> we
>> >> >> > >>>> >> do)? Is there anything we can do, tooling-wise, to improve
>> >> upon
>> >> >> > this?
>> >> >> > >>>> >>
>> >> >> > >>>> >> On Thu, Jan 14, 2016 at 1:40 PM, Maxim Khutornenko <
>> >> >> > maxim@apache.org>
>> >> >> > >>>> >> wrote:
>> >> >> > >>>> >>
>> >> >> > >>>> >> > Our rolling update APIs can be quite inconvenient to work
>> >> with
>> >> >> > when it
>> >> >> > >>>> >> > comes to instance scaling [1]. It's especially
>> frustrating
>> >> when
>> >> >> > >>>> >> > adding/removing instances has to be done in an automated
>> >> >> fashion
>> >> >> > >>>> (e.g.:
>> >> >> > >>>> >> by
>> >> >> > >>>> >> > an external autoscaling process) as it requires holding
>> on
>> >> to
>> >> >> the
>> >> >> > >>>> >> original
>> >> >> > >>>> >> > aurora config at all times.
>> >> >> > >>>> >> >
>> >> >> > >>>> >> > I propose we add simple instance scaling APIs to address
>> the
>> >> >> > above.
>> >> >> > >>>> Since
>> >> >> > >>>> >> > Aurora job may have instances at different configs at any
>> >> >> > moment, I
>> >> >> > >>>> >> propose
>> >> >> > >>>> >> > we accept an InstanceKey as a reference point when
>> scaling
>> >> out.
>> >> >> > For
>> >> >> > >>>> >> > example:
>> >> >> > >>>> >> >
>> >> >> > >>>> >> >     /** Scales out a given job by adding more instances
>> with
>> >> >> the
>> >> >> > task
>> >> >> > >>>> >> > config of the templateKey. */
>> >> >> > >>>> >> >     Response scaleOut(1: InstanceKey templateKey, 2: i32
>> >> >> > >>>> incrementCount)
>> >> >> > >>>> >> >
>> >> >> > >>>> >> >     /** Scales in a given job by removing existing
>> >> instances.
>> >> >> */
>> >> >> > >>>> >> >     Response scaleIn(1: JobKey job, 2: i32
>> decrementCount)
>> >> >> > >>>> >> >
>> >> >> > >>>> >> > A correspondent client command could then look like:
>> >> >> > >>>> >> >
>> >> >> > >>>> >> >     aurora job scale-out devcluster/vagrant/test/hello/1
>> 10
>> >> >> > >>>> >> >
>> >> >> > >>>> >> > For the above command, a scheduler would take task
>> config of
>> >> >> > instance
>> >> >> > >>>> 1
>> >> >> > >>>> >> of
>> >> >> > >>>> >> > the 'hello' job and replicate it 10 more times thus
>> adding
>> >> 10
>> >> >> > >>>> additional
>> >> >> > >>>> >> > instances to the job.
>> >> >> > >>>> >> >
>> >> >> > >>>> >> > There are, of course, some details to work out like
>> making
>> >> sure
>> >> >> > no
>> >> >> > >>>> active
>> >> >> > >>>> >> > update is in flight, scale out does not violate quota and
>> >> etc.
>> >> >> I
>> >> >> > >>>> intend
>> >> >> > >>>> >> to
>> >> >> > >>>> >> > address those during the implementation as things
>> progress.
>> >> >> > >>>> >> >
>> >> >> > >>>> >> > Does the above make sense? Any concerns/suggestions?
>> >> >> > >>>> >> >
>> >> >> > >>>> >> > Thanks,
>> >> >> > >>>> >> > Maxim
>> >> >> > >>>> >> >
>> >> >> > >>>> >> > [1] - https://issues.apache.org/jira/browse/AURORA-1258
>> >> >> > >>>> >> >
>> >> >> > >>>> >>
>> >> >> > >>>>
>> >> >> >
>> >> >>
>> >>
>>