aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McLaughlin <dmclaugh...@apache.org>
Subject Re: Dynamic Reservations
Date Thu, 09 Mar 2017 02:44:40 GMT
Ticket for replace task primitive already exists:
https://issues.apache.org/jira/browse/MESOS-1280

On Wed, Mar 8, 2017 at 6:34 PM, David McLaughlin <dmclaughlin@apache.org>
wrote:

> Spoke with Zameer offline and he asked me to post additional thoughts
> here.
>
> My motivation for solving this without dynamic reservations is just the
> sheer number of questions I have after reading the RFC and current design
> doc. And most of them are not about the current proposal and goals or the
> MVP but more about how this feature will scale into persistent storage.
>
> I think best-effort dynamic reservations are such a different problem than
> the reservations that would be needed to support persistent storage. My
> primary concern is around things like quota. For the current proposal and
> the small best-effort feature we're adding, it makes no sense to get into
> the complexities of separate quota for reserved resources vs preferred
> resources, but the reality of exposing such a concept to a large
> organisation where we can't automatically reclaim anything reserved means
> we'd almost definitely want that. The issue with the iterative approach is
> decisions we take here could have a huge impact on those tasks later, once
> we expose the reserved tier into the open. That means more upfront design
> and planning, which so far has blocked a super useful feature that I feel
> all of us want.
>
> My gut feeling is we went about this all wrong. We started with dynamic
> reservations and thought about how we could speed up task scheduling with
> them. If we took the current problem brief and started from first
> principals then I think we'd naturally look for something like a
> replaceTask(offerId, taskInfo) type API from Mesos.
>
> I'll bring this up within our team and see if we can put resources on
> adding such an API. Any feedback on this approach in the meantime is
> welcome.
>
> On Wed, Mar 8, 2017 at 5:30 PM, David McLaughlin <dmclaughlin@apache.org>
> wrote:
>
>> You don't have to store anything with my proposal. Preemption doesn't
>> store anything either. The whole thing is it's just best-effort, and if the
>> Scheduler restarts the worst that would happen is part of the current batch
>> would have to go through the current Scheduling loop that users tolerate
>> and deal with today.
>>
>>
>>
>> On Wed, Mar 8, 2017 at 5:08 PM, Zameer Manji <zmanji@apache.org> wrote:
>>
>>> David,
>>>
>>> I have two concerns with that idea. First, it would require persisting
>>> the
>>> relationship of <Hostname, Resources> to <Task> for every task. I'm
not
>>> sure if adding more storage and storage operations is the ideal way of
>>> solving this problem. Second, in a multi framework environment, a
>>> framework
>>> needs to use dynamic reservations otherwise the resources might be taken
>>> by
>>> another framework.
>>>
>>> On Wed, Mar 8, 2017 at 5:01 PM, David McLaughlin <dmclaughlin@apache.org
>>> >
>>> wrote:
>>>
>>> > So I read the docs again and I have one major question - do we even
>>> need
>>> > dynamic reservations for the current proposal?
>>> >
>>> > The current goal of the proposed work is to keep an offer on a host and
>>> > prevent some other pending task from taking it before the next
>>> scheduling
>>> > round. This exact problem is solved in preemption and we could use a
>>> > similar technique for reserving offers after killing tasks when going
>>> > through the update loop. We wouldn't need to add tiers or
>>> reconciliation or
>>> > solve any of these other concerns. Reusing an offer skips so much of
>>> the
>>> > expensive stuff in the Scheduler that it would be a no-brainer for the
>>> > operator to turn it on for every single task in the cluster.
>>> >
>>> >
>>> > On Thu, Mar 2, 2017 at 7:52 AM, Steve Niemitz <sniemitz@apache.org>
>>> wrote:
>>> >
>>> > > I read over the docs, it looks like a good start.  Personally I
>>> don't see
>>> > > much of a benefit for dynamically reserved cpu/mem, but I'm excited
>>> about
>>> > > the possibility of building off this for dynamically reserved
>>> persistent
>>> > > volumes.
>>> > >
>>> > > I would like to see more detail on how a reservation "times out",
>>> and the
>>> > > configuration options per job around that, as I feel like its the
>>> most
>>> > > complicated part of all of this.  Ideally there would also be hooks
>>> into
>>> > > the host maintenance APIs here.
>>> > >
>>> > > I also didn't see any mention of it, but I believe mesos requires the
>>> > > framework to reserve resources with a role.  By default aurora runs
>>> as
>>> > the
>>> > > special "*" role, does this mean aurora will need to have a role
>>> > specified
>>> > > now for this to work?  Or does mesos allow reserving resources
>>> without a
>>> > > role?
>>> > >
>>> > > On Thu, Mar 2, 2017 at 8:35 AM, Erb, Stephan <
>>> > Stephan.Erb@blue-yonder.com>
>>> > > wrote:
>>> > >
>>> > > > Hi everyone,
>>> > > >
>>> > > > There have been two documents on Dynamic Reservations as a first
>>> step
>>> > > > towards persistent services:
>>> > > >
>>> > > > ·         RFC: https://docs.google.com/document/d/
>>> > > > 15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A/edit#heading=h.
>>> > hcsc8tda08vy
>>> > > >
>>> > > > ·         Technical Design Doc:  https://docs.google.com/docume
>>> nt/d/
>>> > > > 1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE/edit#heading=h.
>>> > klg3urfbnq3v
>>> > > >
>>> > > > Since a couple of days there are also now two patches online for
a
>>> MVP
>>> > by
>>> > > > Dmitriy:
>>> > > >
>>> > > > ·         https://reviews.apache.org/r/56690/
>>> > > >
>>> > > > ·         https://reviews.apache.org/r/56691/
>>> > > >
>>> > > > From reading the documents, I am under the impression that there
>>> is a
>>> > > > rough consensus on the following points:
>>> > > >
>>> > > > ·         We want dynamic reservations. Our general goal is to
>>> enable
>>> > the
>>> > > > re-scheduling of tasks on the same host they used in a previous
>>> run.
>>> > > >
>>> > > > ·         Dynamic reservations are a best-effort feature. If
in
>>> doubt,
>>> > a
>>> > > > task will be scheduled somewhere else.
>>> > > >
>>> > > > ·         Jobs opt into reserved resources using an appropriate
>>> tier
>>> > > > config.
>>> > > >
>>> > > > ·         The tier config in supposed to be neither preemptible
nor
>>> > > > revocable. Reserving resources therefore requires appropriate
>>> quota.
>>> > > >
>>> > > > ·         Aurora will tag reserved Mesos resources by adding
the
>>> unique
>>> > > > instance key of the reserving task instance as a label. Only this
>>> task
>>> > > > instance will be allowed to use those tagged resources.
>>> > > >
>>> > > > I am unclear on the following general questions as there is
>>> > contradicting
>>> > > > content:
>>> > > >
>>> > > > a)       How does the user interact with reservations?  There
are
>>> > several
>>> > > > proposals in the documents to auto-reserve on `aurora job create`
>>> or
>>> > > > `aurora cron schedule` and to automatically un-reserve on the
>>> > appropriate
>>> > > > reverse actions. But will we also allow a user further control
>>> over the
>>> > > > reservations so that they can manage those independent of the
>>> task/job
>>> > > > lifecycle? For example, how does Borg handle this?
>>> > > >
>>> > > > b)       The implementation proposal and patches include an
>>> > > > OfferReconciler, so this implies we don’t want to offer any
>>> control for
>>> > > the
>>> > > > user. The only control mechanism will be the cluster-wide offer
>>> wait
>>> > time
>>> > > > limiting the number of seconds unused reserved resources can linger
>>> > > before
>>> > > > they are un-reserved.
>>> > > >
>>> > > > c)       Will we allow adhoc/cron jobs to reserve resources? Does
>>> it
>>> > even
>>> > > > matter if we don’t give control to users and just rely on the
>>> > > > OfferReconciler?
>>> > > >
>>> > > >
>>> > > > I have a couple of questions on the MVP and some implementation
>>> > details.
>>> > > I
>>> > > > will follow up with those in a separate mail.
>>> > > >
>>> > > > Thanks and best regards,
>>> > > > Stephan
>>> > > >
>>> > >
>>> >
>>> > --
>>> > Zameer Manji
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message