aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McLaughlin <dmclaugh...@apache.org>
Subject Re: Dynamic Reservations
Date Thu, 09 Mar 2017 01:30:55 GMT
You don't have to store anything with my proposal. Preemption doesn't store
anything either. The whole thing is it's just best-effort, and if the
Scheduler restarts the worst that would happen is part of the current batch
would have to go through the current Scheduling loop that users tolerate
and deal with today.



On Wed, Mar 8, 2017 at 5:08 PM, Zameer Manji <zmanji@apache.org> wrote:

> David,
>
> I have two concerns with that idea. First, it would require persisting the
> relationship of <Hostname, Resources> to <Task> for every task. I'm not
> sure if adding more storage and storage operations is the ideal way of
> solving this problem. Second, in a multi framework environment, a framework
> needs to use dynamic reservations otherwise the resources might be taken by
> another framework.
>
> On Wed, Mar 8, 2017 at 5:01 PM, David McLaughlin <dmclaughlin@apache.org>
> wrote:
>
> > So I read the docs again and I have one major question - do we even need
> > dynamic reservations for the current proposal?
> >
> > The current goal of the proposed work is to keep an offer on a host and
> > prevent some other pending task from taking it before the next scheduling
> > round. This exact problem is solved in preemption and we could use a
> > similar technique for reserving offers after killing tasks when going
> > through the update loop. We wouldn't need to add tiers or reconciliation
> or
> > solve any of these other concerns. Reusing an offer skips so much of the
> > expensive stuff in the Scheduler that it would be a no-brainer for the
> > operator to turn it on for every single task in the cluster.
> >
> >
> > On Thu, Mar 2, 2017 at 7:52 AM, Steve Niemitz <sniemitz@apache.org>
> wrote:
> >
> > > I read over the docs, it looks like a good start.  Personally I don't
> see
> > > much of a benefit for dynamically reserved cpu/mem, but I'm excited
> about
> > > the possibility of building off this for dynamically reserved
> persistent
> > > volumes.
> > >
> > > I would like to see more detail on how a reservation "times out", and
> the
> > > configuration options per job around that, as I feel like its the most
> > > complicated part of all of this.  Ideally there would also be hooks
> into
> > > the host maintenance APIs here.
> > >
> > > I also didn't see any mention of it, but I believe mesos requires the
> > > framework to reserve resources with a role.  By default aurora runs as
> > the
> > > special "*" role, does this mean aurora will need to have a role
> > specified
> > > now for this to work?  Or does mesos allow reserving resources without
> a
> > > role?
> > >
> > > On Thu, Mar 2, 2017 at 8:35 AM, Erb, Stephan <
> > Stephan.Erb@blue-yonder.com>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > There have been two documents on Dynamic Reservations as a first step
> > > > towards persistent services:
> > > >
> > > > ·         RFC: https://docs.google.com/document/d/
> > > > 15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A/edit#heading=h.
> > hcsc8tda08vy
> > > >
> > > > ·         Technical Design Doc:  https://docs.google.com/document/d/
> > > > 1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE/edit#heading=h.
> > klg3urfbnq3v
> > > >
> > > > Since a couple of days there are also now two patches online for a
> MVP
> > by
> > > > Dmitriy:
> > > >
> > > > ·         https://reviews.apache.org/r/56690/
> > > >
> > > > ·         https://reviews.apache.org/r/56691/
> > > >
> > > > From reading the documents, I am under the impression that there is a
> > > > rough consensus on the following points:
> > > >
> > > > ·         We want dynamic reservations. Our general goal is to enable
> > the
> > > > re-scheduling of tasks on the same host they used in a previous run.
> > > >
> > > > ·         Dynamic reservations are a best-effort feature. If in
> doubt,
> > a
> > > > task will be scheduled somewhere else.
> > > >
> > > > ·         Jobs opt into reserved resources using an appropriate tier
> > > > config.
> > > >
> > > > ·         The tier config in supposed to be neither preemptible nor
> > > > revocable. Reserving resources therefore requires appropriate quota.
> > > >
> > > > ·         Aurora will tag reserved Mesos resources by adding the
> unique
> > > > instance key of the reserving task instance as a label. Only this
> task
> > > > instance will be allowed to use those tagged resources.
> > > >
> > > > I am unclear on the following general questions as there is
> > contradicting
> > > > content:
> > > >
> > > > a)       How does the user interact with reservations?  There are
> > several
> > > > proposals in the documents to auto-reserve on `aurora job create` or
> > > > `aurora cron schedule` and to automatically un-reserve on the
> > appropriate
> > > > reverse actions. But will we also allow a user further control over
> the
> > > > reservations so that they can manage those independent of the
> task/job
> > > > lifecycle? For example, how does Borg handle this?
> > > >
> > > > b)       The implementation proposal and patches include an
> > > > OfferReconciler, so this implies we don’t want to offer any control
> for
> > > the
> > > > user. The only control mechanism will be the cluster-wide offer wait
> > time
> > > > limiting the number of seconds unused reserved resources can linger
> > > before
> > > > they are un-reserved.
> > > >
> > > > c)       Will we allow adhoc/cron jobs to reserve resources? Does it
> > even
> > > > matter if we don’t give control to users and just rely on the
> > > > OfferReconciler?
> > > >
> > > >
> > > > I have a couple of questions on the MVP and some implementation
> > details.
> > > I
> > > > will follow up with those in a separate mail.
> > > >
> > > > Thanks and best regards,
> > > > Stephan
> > > >
> > >
> >
> > --
> > Zameer Manji
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message