aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McLaughlin <dmclaugh...@apache.org>
Subject Re: Dynamic Reservations
Date Thu, 09 Mar 2017 01:01:17 GMT
So I read the docs again and I have one major question - do we even need
dynamic reservations for the current proposal?

The current goal of the proposed work is to keep an offer on a host and
prevent some other pending task from taking it before the next scheduling
round. This exact problem is solved in preemption and we could use a
similar technique for reserving offers after killing tasks when going
through the update loop. We wouldn't need to add tiers or reconciliation or
solve any of these other concerns. Reusing an offer skips so much of the
expensive stuff in the Scheduler that it would be a no-brainer for the
operator to turn it on for every single task in the cluster.


On Thu, Mar 2, 2017 at 7:52 AM, Steve Niemitz <sniemitz@apache.org> wrote:

> I read over the docs, it looks like a good start.  Personally I don't see
> much of a benefit for dynamically reserved cpu/mem, but I'm excited about
> the possibility of building off this for dynamically reserved persistent
> volumes.
>
> I would like to see more detail on how a reservation "times out", and the
> configuration options per job around that, as I feel like its the most
> complicated part of all of this.  Ideally there would also be hooks into
> the host maintenance APIs here.
>
> I also didn't see any mention of it, but I believe mesos requires the
> framework to reserve resources with a role.  By default aurora runs as the
> special "*" role, does this mean aurora will need to have a role specified
> now for this to work?  Or does mesos allow reserving resources without a
> role?
>
> On Thu, Mar 2, 2017 at 8:35 AM, Erb, Stephan <Stephan.Erb@blue-yonder.com>
> wrote:
>
> > Hi everyone,
> >
> > There have been two documents on Dynamic Reservations as a first step
> > towards persistent services:
> >
> > ·         RFC: https://docs.google.com/document/d/
> > 15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A/edit#heading=h.hcsc8tda08vy
> >
> > ·         Technical Design Doc:  https://docs.google.com/document/d/
> > 1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE/edit#heading=h.klg3urfbnq3v
> >
> > Since a couple of days there are also now two patches online for a MVP by
> > Dmitriy:
> >
> > ·         https://reviews.apache.org/r/56690/
> >
> > ·         https://reviews.apache.org/r/56691/
> >
> > From reading the documents, I am under the impression that there is a
> > rough consensus on the following points:
> >
> > ·         We want dynamic reservations. Our general goal is to enable the
> > re-scheduling of tasks on the same host they used in a previous run.
> >
> > ·         Dynamic reservations are a best-effort feature. If in doubt, a
> > task will be scheduled somewhere else.
> >
> > ·         Jobs opt into reserved resources using an appropriate tier
> > config.
> >
> > ·         The tier config in supposed to be neither preemptible nor
> > revocable. Reserving resources therefore requires appropriate quota.
> >
> > ·         Aurora will tag reserved Mesos resources by adding the unique
> > instance key of the reserving task instance as a label. Only this task
> > instance will be allowed to use those tagged resources.
> >
> > I am unclear on the following general questions as there is contradicting
> > content:
> >
> > a)       How does the user interact with reservations?  There are several
> > proposals in the documents to auto-reserve on `aurora job create` or
> > `aurora cron schedule` and to automatically un-reserve on the appropriate
> > reverse actions. But will we also allow a user further control over the
> > reservations so that they can manage those independent of the task/job
> > lifecycle? For example, how does Borg handle this?
> >
> > b)       The implementation proposal and patches include an
> > OfferReconciler, so this implies we don’t want to offer any control for
> the
> > user. The only control mechanism will be the cluster-wide offer wait time
> > limiting the number of seconds unused reserved resources can linger
> before
> > they are un-reserved.
> >
> > c)       Will we allow adhoc/cron jobs to reserve resources? Does it even
> > matter if we don’t give control to users and just rely on the
> > OfferReconciler?
> >
> >
> > I have a couple of questions on the MVP and some implementation details.
> I
> > will follow up with those in a separate mail.
> >
> > Thanks and best regards,
> > Stephan
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message