Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 763D0200C28 for ; Mon, 13 Mar 2017 22:33:26 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 73350160B6C; Mon, 13 Mar 2017 21:33:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 95235160B5D for ; Mon, 13 Mar 2017 22:33:25 +0100 (CET) Received: (qmail 4572 invoked by uid 500); 13 Mar 2017 21:33:24 -0000 Mailing-List: contact dev-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list dev@aurora.apache.org Received: (qmail 4561 invoked by uid 99); 13 Mar 2017 21:33:24 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Mar 2017 21:33:24 +0000 Received: from mail-qk0-f182.google.com (mail-qk0-f182.google.com [209.85.220.182]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 5EBEE1A0193 for ; Mon, 13 Mar 2017 21:33:24 +0000 (UTC) Received: by mail-qk0-f182.google.com with SMTP id v125so229384252qkh.2 for ; Mon, 13 Mar 2017 14:33:24 -0700 (PDT) X-Gm-Message-State: AFeK/H1Iyr+5oSUrSf5jAQCQbLBpcJtvZDhHVY3ccPmVEwL/KpndQHaZPxBjecFvN0jUVSkxYmp2PmGRJ8L4KA== X-Received: by 10.55.192.195 with SMTP id v64mr32776141qkv.155.1489440803271; Mon, 13 Mar 2017 14:33:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.55.26.231 with HTTP; Mon, 13 Mar 2017 14:33:22 -0700 (PDT) In-Reply-To: References: From: Joshua Cohen Date: Mon, 13 Mar 2017 16:33:22 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Dynamic Reservations To: dev@aurora.apache.org Content-Type: multipart/alternative; boundary=001a1147a3f2feb184054aa3759f archived-at: Mon, 13 Mar 2017 21:33:26 -0000 --001a1147a3f2feb184054aa3759f Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Dmitriy, There's a fair number of comments both here and on the doc. Will you have time to respond to these so we can find a path forward? Cheers, Joshua On Wed, Mar 8, 2017 at 8:44 PM, David McLaughlin wrote: > Ticket for replace task primitive already exists: > https://issues.apache.org/jira/browse/MESOS-1280 > > On Wed, Mar 8, 2017 at 6:34 PM, David McLaughlin > wrote: > > > Spoke with Zameer offline and he asked me to post additional thoughts > > here. > > > > My motivation for solving this without dynamic reservations is just the > > sheer number of questions I have after reading the RFC and current desi= gn > > doc. And most of them are not about the current proposal and goals or t= he > > MVP but more about how this feature will scale into persistent storage. > > > > I think best-effort dynamic reservations are such a different problem > than > > the reservations that would be needed to support persistent storage. My > > primary concern is around things like quota. For the current proposal a= nd > > the small best-effort feature we're adding, it makes no sense to get in= to > > the complexities of separate quota for reserved resources vs preferred > > resources, but the reality of exposing such a concept to a large > > organisation where we can't automatically reclaim anything reserved mea= ns > > we'd almost definitely want that. The issue with the iterative approach > is > > decisions we take here could have a huge impact on those tasks later, > once > > we expose the reserved tier into the open. That means more upfront desi= gn > > and planning, which so far has blocked a super useful feature that I fe= el > > all of us want. > > > > My gut feeling is we went about this all wrong. We started with dynamic > > reservations and thought about how we could speed up task scheduling wi= th > > them. If we took the current problem brief and started from first > > principals then I think we'd naturally look for something like a > > replaceTask(offerId, taskInfo) type API from Mesos. > > > > I'll bring this up within our team and see if we can put resources on > > adding such an API. Any feedback on this approach in the meantime is > > welcome. > > > > On Wed, Mar 8, 2017 at 5:30 PM, David McLaughlin > > > wrote: > > > >> You don't have to store anything with my proposal. Preemption doesn't > >> store anything either. The whole thing is it's just best-effort, and i= f > the > >> Scheduler restarts the worst that would happen is part of the current > batch > >> would have to go through the current Scheduling loop that users tolera= te > >> and deal with today. > >> > >> > >> > >> On Wed, Mar 8, 2017 at 5:08 PM, Zameer Manji wrote= : > >> > >>> David, > >>> > >>> I have two concerns with that idea. First, it would require persistin= g > >>> the > >>> relationship of to for every task. I'm n= ot > >>> sure if adding more storage and storage operations is the ideal way o= f > >>> solving this problem. Second, in a multi framework environment, a > >>> framework > >>> needs to use dynamic reservations otherwise the resources might be > taken > >>> by > >>> another framework. > >>> > >>> On Wed, Mar 8, 2017 at 5:01 PM, David McLaughlin < > dmclaughlin@apache.org > >>> > > >>> wrote: > >>> > >>> > So I read the docs again and I have one major question - do we even > >>> need > >>> > dynamic reservations for the current proposal? > >>> > > >>> > The current goal of the proposed work is to keep an offer on a host > and > >>> > prevent some other pending task from taking it before the next > >>> scheduling > >>> > round. This exact problem is solved in preemption and we could use = a > >>> > similar technique for reserving offers after killing tasks when goi= ng > >>> > through the update loop. We wouldn't need to add tiers or > >>> reconciliation or > >>> > solve any of these other concerns. Reusing an offer skips so much o= f > >>> the > >>> > expensive stuff in the Scheduler that it would be a no-brainer for > the > >>> > operator to turn it on for every single task in the cluster. > >>> > > >>> > > >>> > On Thu, Mar 2, 2017 at 7:52 AM, Steve Niemitz > >>> wrote: > >>> > > >>> > > I read over the docs, it looks like a good start. Personally I > >>> don't see > >>> > > much of a benefit for dynamically reserved cpu/mem, but I'm excit= ed > >>> about > >>> > > the possibility of building off this for dynamically reserved > >>> persistent > >>> > > volumes. > >>> > > > >>> > > I would like to see more detail on how a reservation "times out", > >>> and the > >>> > > configuration options per job around that, as I feel like its the > >>> most > >>> > > complicated part of all of this. Ideally there would also be hoo= ks > >>> into > >>> > > the host maintenance APIs here. > >>> > > > >>> > > I also didn't see any mention of it, but I believe mesos requires > the > >>> > > framework to reserve resources with a role. By default aurora ru= ns > >>> as > >>> > the > >>> > > special "*" role, does this mean aurora will need to have a role > >>> > specified > >>> > > now for this to work? Or does mesos allow reserving resources > >>> without a > >>> > > role? > >>> > > > >>> > > On Thu, Mar 2, 2017 at 8:35 AM, Erb, Stephan < > >>> > Stephan.Erb@blue-yonder.com> > >>> > > wrote: > >>> > > > >>> > > > Hi everyone, > >>> > > > > >>> > > > There have been two documents on Dynamic Reservations as a firs= t > >>> step > >>> > > > towards persistent services: > >>> > > > > >>> > > > =C2=B7 RFC: https://docs.google.com/document/d/ > >>> > > > 15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A/edit#heading=3Dh. > >>> > hcsc8tda08vy > >>> > > > > >>> > > > =C2=B7 Technical Design Doc: https://docs.google.com/d= ocume > >>> nt/d/ > >>> > > > 1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE/edit#heading=3Dh. > >>> > klg3urfbnq3v > >>> > > > > >>> > > > Since a couple of days there are also now two patches online fo= r > a > >>> MVP > >>> > by > >>> > > > Dmitriy: > >>> > > > > >>> > > > =C2=B7 https://reviews.apache.org/r/56690/ > >>> > > > > >>> > > > =C2=B7 https://reviews.apache.org/r/56691/ > >>> > > > > >>> > > > From reading the documents, I am under the impression that ther= e > >>> is a > >>> > > > rough consensus on the following points: > >>> > > > > >>> > > > =C2=B7 We want dynamic reservations. Our general goal i= s to > >>> enable > >>> > the > >>> > > > re-scheduling of tasks on the same host they used in a previous > >>> run. > >>> > > > > >>> > > > =C2=B7 Dynamic reservations are a best-effort feature. = If in > >>> doubt, > >>> > a > >>> > > > task will be scheduled somewhere else. > >>> > > > > >>> > > > =C2=B7 Jobs opt into reserved resources using an approp= riate > >>> tier > >>> > > > config. > >>> > > > > >>> > > > =C2=B7 The tier config in supposed to be neither preemp= tible > nor > >>> > > > revocable. Reserving resources therefore requires appropriate > >>> quota. > >>> > > > > >>> > > > =C2=B7 Aurora will tag reserved Mesos resources by addi= ng the > >>> unique > >>> > > > instance key of the reserving task instance as a label. Only th= is > >>> task > >>> > > > instance will be allowed to use those tagged resources. > >>> > > > > >>> > > > I am unclear on the following general questions as there is > >>> > contradicting > >>> > > > content: > >>> > > > > >>> > > > a) How does the user interact with reservations? There a= re > >>> > several > >>> > > > proposals in the documents to auto-reserve on `aurora job creat= e` > >>> or > >>> > > > `aurora cron schedule` and to automatically un-reserve on the > >>> > appropriate > >>> > > > reverse actions. But will we also allow a user further control > >>> over the > >>> > > > reservations so that they can manage those independent of the > >>> task/job > >>> > > > lifecycle? For example, how does Borg handle this? > >>> > > > > >>> > > > b) The implementation proposal and patches include an > >>> > > > OfferReconciler, so this implies we don=E2=80=99t want to offer= any > >>> control for > >>> > > the > >>> > > > user. The only control mechanism will be the cluster-wide offer > >>> wait > >>> > time > >>> > > > limiting the number of seconds unused reserved resources can > linger > >>> > > before > >>> > > > they are un-reserved. > >>> > > > > >>> > > > c) Will we allow adhoc/cron jobs to reserve resources? Do= es > >>> it > >>> > even > >>> > > > matter if we don=E2=80=99t give control to users and just rely = on the > >>> > > > OfferReconciler? > >>> > > > > >>> > > > > >>> > > > I have a couple of questions on the MVP and some implementation > >>> > details. > >>> > > I > >>> > > > will follow up with those in a separate mail. > >>> > > > > >>> > > > Thanks and best regards, > >>> > > > Stephan > >>> > > > > >>> > > > >>> > > >>> > -- > >>> > Zameer Manji > >>> > > >>> > >> > >> > > > --001a1147a3f2feb184054aa3759f--