Mailing-List: contact dev-help@aurora.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@aurora.apache.org
MIME-Version: 1.0
In-Reply-To: <CAOOJoEyFSjqdELvZzCCip5TYsccOaiMauZq==8WxLuxiFGmJTA@mail.gmail.com>
References: <F63EA30D-4759-49A2-865D-40D8600E15B8@blue-yonder.com>
 <CAJ4qn4kVeMpQTKCOnWe0QisZNfcjH+atdo6FRuDN074vjUue8g@mail.gmail.com>
 <CAOOJoEzm1uqohivpz04-KXBUi7ts5ijZLONC+u49jatwdVS_OA@mail.gmail.com>
 <CAM+cpfc16cedkEzHyfz=WNES37R4++C5He+zS8c7ERK229_O=A@mail.gmail.com>
 <CAOOJoEzAgJbm980n5L_fJnBHy-5sP22HbZ3Q9RvLDKv7sqBWpw@mail.gmail.com>
 <CAOOJoEznZuRahhk=1=8iSAD1zfTYEdKEqtbj=Ycdy+boN03BEA@mail.gmail.com> <CAOOJoEyFSjqdELvZzCCip5TYsccOaiMauZq==8WxLuxiFGmJTA@mail.gmail.com>
From: Joshua Cohen <jcohen@apache.org>
Date: Mon, 13 Mar 2017 16:33:22 -0500
Message-ID: <CAMnduq-4KC2ARH1XXG+zD8rbuHbxznYjojiVHjPVNh_cm5AiyA@mail.gmail.com>
Subject: Re: Dynamic Reservations
To: dev@aurora.apache.org
Content-Type: multipart/alternative; boundary=001a1147a3f2feb184054aa3759f
archived-at: Mon, 13 Mar 2017 21:33:26 -0000

--001a1147a3f2feb184054aa3759f
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Dmitriy,

There's a fair number of comments both here and on the doc. Will you have
time to respond to these so we can find a path forward?

Cheers,

Joshua

On Wed, Mar 8, 2017 at 8:44 PM, David McLaughlin <dmclaughlin@apache.org>
wrote:

> Ticket for replace task primitive already exists:
> https://issues.apache.org/jira/browse/MESOS-1280
>
> On Wed, Mar 8, 2017 at 6:34 PM, David McLaughlin <dmclaughlin@apache.org>
> wrote:
>
> > Spoke with Zameer offline and he asked me to post additional thoughts
> > here.
> >
> > My motivation for solving this without dynamic reservations is just the
> > sheer number of questions I have after reading the RFC and current desi=
gn
> > doc. And most of them are not about the current proposal and goals or t=
he
> > MVP but more about how this feature will scale into persistent storage.
> >
> > I think best-effort dynamic reservations are such a different problem
> than
> > the reservations that would be needed to support persistent storage. My
> > primary concern is around things like quota. For the current proposal a=
nd
> > the small best-effort feature we're adding, it makes no sense to get in=
to
> > the complexities of separate quota for reserved resources vs preferred
> > resources, but the reality of exposing such a concept to a large
> > organisation where we can't automatically reclaim anything reserved mea=
ns
> > we'd almost definitely want that. The issue with the iterative approach
> is
> > decisions we take here could have a huge impact on those tasks later,
> once
> > we expose the reserved tier into the open. That means more upfront desi=
gn
> > and planning, which so far has blocked a super useful feature that I fe=
el
> > all of us want.
> >
> > My gut feeling is we went about this all wrong. We started with dynamic
> > reservations and thought about how we could speed up task scheduling wi=
th
> > them. If we took the current problem brief and started from first
> > principals then I think we'd naturally look for something like a
> > replaceTask(offerId, taskInfo) type API from Mesos.
> >
> > I'll bring this up within our team and see if we can put resources on
> > adding such an API. Any feedback on this approach in the meantime is
> > welcome.
> >
> > On Wed, Mar 8, 2017 at 5:30 PM, David McLaughlin <dmclaughlin@apache.or=
g
> >
> > wrote:
> >
> >> You don't have to store anything with my proposal. Preemption doesn't
> >> store anything either. The whole thing is it's just best-effort, and i=
f
> the
> >> Scheduler restarts the worst that would happen is part of the current
> batch
> >> would have to go through the current Scheduling loop that users tolera=
te
> >> and deal with today.
> >>
> >>
> >>
> >> On Wed, Mar 8, 2017 at 5:08 PM, Zameer Manji <zmanji@apache.org> wrote=
:
> >>
> >>> David,
> >>>
> >>> I have two concerns with that idea. First, it would require persistin=
g
> >>> the
> >>> relationship of <Hostname, Resources> to <Task> for every task. I'm n=
ot
> >>> sure if adding more storage and storage operations is the ideal way o=
f
> >>> solving this problem. Second, in a multi framework environment, a
> >>> framework
> >>> needs to use dynamic reservations otherwise the resources might be
> taken
> >>> by
> >>> another framework.
> >>>
> >>> On Wed, Mar 8, 2017 at 5:01 PM, David McLaughlin <
> dmclaughlin@apache.org
> >>> >
> >>> wrote:
> >>>
> >>> > So I read the docs again and I have one major question - do we even
> >>> need
> >>> > dynamic reservations for the current proposal?
> >>> >
> >>> > The current goal of the proposed work is to keep an offer on a host
> and
> >>> > prevent some other pending task from taking it before the next
> >>> scheduling
> >>> > round. This exact problem is solved in preemption and we could use =
a
> >>> > similar technique for reserving offers after killing tasks when goi=
ng
> >>> > through the update loop. We wouldn't need to add tiers or
> >>> reconciliation or
> >>> > solve any of these other concerns. Reusing an offer skips so much o=
f
> >>> the
> >>> > expensive stuff in the Scheduler that it would be a no-brainer for
> the
> >>> > operator to turn it on for every single task in the cluster.
> >>> >
> >>> >
> >>> > On Thu, Mar 2, 2017 at 7:52 AM, Steve Niemitz <sniemitz@apache.org>
> >>> wrote:
> >>> >
> >>> > > I read over the docs, it looks like a good start.  Personally I
> >>> don't see
> >>> > > much of a benefit for dynamically reserved cpu/mem, but I'm excit=
ed
> >>> about
> >>> > > the possibility of building off this for dynamically reserved
> >>> persistent
> >>> > > volumes.
> >>> > >
> >>> > > I would like to see more detail on how a reservation "times out",
> >>> and the
> >>> > > configuration options per job around that, as I feel like its the
> >>> most
> >>> > > complicated part of all of this.  Ideally there would also be hoo=
ks
> >>> into
> >>> > > the host maintenance APIs here.
> >>> > >
> >>> > > I also didn't see any mention of it, but I believe mesos requires
> the
> >>> > > framework to reserve resources with a role.  By default aurora ru=
ns
> >>> as
> >>> > the
> >>> > > special "*" role, does this mean aurora will need to have a role
> >>> > specified
> >>> > > now for this to work?  Or does mesos allow reserving resources
> >>> without a
> >>> > > role?
> >>> > >
> >>> > > On Thu, Mar 2, 2017 at 8:35 AM, Erb, Stephan <
> >>> > Stephan.Erb@blue-yonder.com>
> >>> > > wrote:
> >>> > >
> >>> > > > Hi everyone,
> >>> > > >
> >>> > > > There have been two documents on Dynamic Reservations as a firs=
t
> >>> step
> >>> > > > towards persistent services:
> >>> > > >
> >>> > > > =C2=B7         RFC: https://docs.google.com/document/d/
> >>> > > > 15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A/edit#heading=3Dh.
> >>> > hcsc8tda08vy
> >>> > > >
> >>> > > > =C2=B7         Technical Design Doc:  https://docs.google.com/d=
ocume
> >>> nt/d/
> >>> > > > 1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE/edit#heading=3Dh.
> >>> > klg3urfbnq3v
> >>> > > >
> >>> > > > Since a couple of days there are also now two patches online fo=
r
> a
> >>> MVP
> >>> > by
> >>> > > > Dmitriy:
> >>> > > >
> >>> > > > =C2=B7         https://reviews.apache.org/r/56690/
> >>> > > >
> >>> > > > =C2=B7         https://reviews.apache.org/r/56691/
> >>> > > >
> >>> > > > From reading the documents, I am under the impression that ther=
e
> >>> is a
> >>> > > > rough consensus on the following points:
> >>> > > >
> >>> > > > =C2=B7         We want dynamic reservations. Our general goal i=
s to
> >>> enable
> >>> > the
> >>> > > > re-scheduling of tasks on the same host they used in a previous
> >>> run.
> >>> > > >
> >>> > > > =C2=B7         Dynamic reservations are a best-effort feature. =
If in
> >>> doubt,
> >>> > a
> >>> > > > task will be scheduled somewhere else.
> >>> > > >
> >>> > > > =C2=B7         Jobs opt into reserved resources using an approp=
riate
> >>> tier
> >>> > > > config.
> >>> > > >
> >>> > > > =C2=B7         The tier config in supposed to be neither preemp=
tible
> nor
> >>> > > > revocable. Reserving resources therefore requires appropriate
> >>> quota.
> >>> > > >
> >>> > > > =C2=B7         Aurora will tag reserved Mesos resources by addi=
ng the
> >>> unique
> >>> > > > instance key of the reserving task instance as a label. Only th=
is
> >>> task
> >>> > > > instance will be allowed to use those tagged resources.
> >>> > > >
> >>> > > > I am unclear on the following general questions as there is
> >>> > contradicting
> >>> > > > content:
> >>> > > >
> >>> > > > a)       How does the user interact with reservations?  There a=
re
> >>> > several
> >>> > > > proposals in the documents to auto-reserve on `aurora job creat=
e`
> >>> or
> >>> > > > `aurora cron schedule` and to automatically un-reserve on the
> >>> > appropriate
> >>> > > > reverse actions. But will we also allow a user further control
> >>> over the
> >>> > > > reservations so that they can manage those independent of the
> >>> task/job
> >>> > > > lifecycle? For example, how does Borg handle this?
> >>> > > >
> >>> > > > b)       The implementation proposal and patches include an
> >>> > > > OfferReconciler, so this implies we don=E2=80=99t want to offer=
 any
> >>> control for
> >>> > > the
> >>> > > > user. The only control mechanism will be the cluster-wide offer
> >>> wait
> >>> > time
> >>> > > > limiting the number of seconds unused reserved resources can
> linger
> >>> > > before
> >>> > > > they are un-reserved.
> >>> > > >
> >>> > > > c)       Will we allow adhoc/cron jobs to reserve resources? Do=
es
> >>> it
> >>> > even
> >>> > > > matter if we don=E2=80=99t give control to users and just rely =
on the
> >>> > > > OfferReconciler?
> >>> > > >
> >>> > > >
> >>> > > > I have a couple of questions on the MVP and some implementation
> >>> > details.
> >>> > > I
> >>> > > > will follow up with those in a separate mail.
> >>> > > >
> >>> > > > Thanks and best regards,
> >>> > > > Stephan
> >>> > > >
> >>> > >
> >>> >
> >>> > --
> >>> > Zameer Manji
> >>> >
> >>>
> >>
> >>
> >
>

--001a1147a3f2feb184054aa3759f--