Return-Path: X-Original-To: apmail-aurora-dev-archive@minotaur.apache.org Delivered-To: apmail-aurora-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0879218D73 for ; Wed, 20 Jan 2016 04:22:50 +0000 (UTC) Received: (qmail 37010 invoked by uid 500); 20 Jan 2016 04:22:49 -0000 Delivered-To: apmail-aurora-dev-archive@aurora.apache.org Received: (qmail 36967 invoked by uid 500); 20 Jan 2016 04:22:49 -0000 Mailing-List: contact dev-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list dev@aurora.apache.org Received: (qmail 36956 invoked by uid 99); 20 Jan 2016 04:22:49 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2016 04:22:49 +0000 Received: from mail-oi0-f48.google.com (mail-oi0-f48.google.com [209.85.218.48]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 590741A0323 for ; Wed, 20 Jan 2016 04:22:49 +0000 (UTC) Received: by mail-oi0-f48.google.com with SMTP id k206so208986987oia.1 for ; Tue, 19 Jan 2016 20:22:49 -0800 (PST) X-Gm-Message-State: ALoCoQlZFWfHrEV6UpsCJSfl0nR5/LryLI2gwBCXeqnDONPDOAr8eIfPCwCii1CxDfMGNt2LzBUAZPTjRRoeSEDMNojEP6Vi/w== MIME-Version: 1.0 X-Received: by 10.202.55.86 with SMTP id e83mr24677998oia.46.1453263768491; Tue, 19 Jan 2016 20:22:48 -0800 (PST) Received: by 10.202.193.213 with HTTP; Tue, 19 Jan 2016 20:22:48 -0800 (PST) In-Reply-To: References: <854CEFF1-E22F-4C9A-9AD0-674CAB418311@gmail.com> Date: Tue, 19 Jan 2016 20:22:48 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Non-exclusive dedicated constraint From: Bill Farner To: "dev@aurora.apache.org" Content-Type: multipart/alternative; boundary=001a113fcf8caf9df20529bc5670 --001a113fcf8caf9df20529bc5670 Content-Type: text/plain; charset=UTF-8 Not a host->attribute mapping (attribute in the mesos sense, anyway). Rather an out-of-band API for marking machines as reserved. For task->offer mapping it's just a matter of another data source. Does that make sense? On Tuesday, January 19, 2016, Maxim Khutornenko wrote: > > > > Can't this just be any old Constraint (not named "dedicated"). In other > > words, doesn't this code already deal with non-dedicated constraints?: > > > > > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 > > > Not really. There is a subtle difference here. A regular (non-dedicated) > constraint does not prevent other tasks from landing on a given machine set > whereas dedicated keeps other tasks away by only allowing those matching > the dedicated attribute. What this proposal targets is allowing exclusive > machine pool matching any job that has this new constraint while keeping > all other tasks that don't have that attribute away. > > Following an example from my original post, imagine a GPU machine pool. Any > job (from any role) requiring GPU resource would be allowed while all other > jobs that don't have that constraint would be vetoed. > > Also, regarding dedicated constraints necessitating a slave restart - i've > > pondered moving dedicated machine management to the scheduler for similar > > purposes. There's not really much forcing that behavior to be managed > with > > a slave attribute. > > > Would you mind giving a few more hints on the mechanics behind this? How > would scheduler know about dedicated hw without the slave attributes set? > Are you proposing storing hostname->attribute mapping in the scheduler > store? > > On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner > wrote: > > > Joe - if you want to pursue this, I suggest you start another thread to > > keep this thread's discussion in tact. I will not be able to lead this > > change, but can certainly shepherd! > > > > On Tuesday, January 19, 2016, Joe Smith > wrote: > > > > > As an operator, that'd be a relatively simple change in tooling, and > the > > > benefits of not forcing a slave restart would be _huge_. > > > > > > Keeping the dedicated semantics (but adding non-exclusive) would be > ideal > > > if possible. > > > > > > > On Jan 19, 2016, at 19:09, Bill Farner > > > > wrote: > > > > > > > > Also, regarding dedicated constraints necessitating a slave restart - > > > i've > > > > pondered moving dedicated machine management to the scheduler for > > similar > > > > purposes. There's not really much forcing that behavior to be > managed > > > with > > > > a slave attribute. > > > > > > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois > > > > wrote: > > > > > > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko < > maxim@apache.org > > > > > > > >> wrote: > > > >> > > > >>> Has anyone explored an idea of having a non-exclusive (wrt job > role) > > > >>> dedicated constraint in Aurora before? > > > >> > > > >> > > > >>> We do have a dedicated constraint now but it assumes a 1:1 > > > >>> relationship between a job role and a slave attribute [1]. For > > > >>> example: a 'www-data/prod/hello' job with a dedicated constraint of > > > >>> 'dedicated': 'www-data/hello' may only be pinned to a particular > set > > > >>> of slaves if all of them have 'www-data/hello' attribute set. No > > other > > > >>> role tasks will be able to land on those slaves unless their > > > >>> 'role/name' pair is added into the slave attribute set. > > > >>> > > > >>> The above is very limiting as it prevents carving out subsets of a > > > >>> shared pool cluster to be used by multiple roles at the same time. > > > >>> Would it make sense to have a free-form dedicated constraint not > > bound > > > >>> to a particular role? Multiple jobs could then use this type of > > > >>> constraint dynamically without modifying the slave command line > (and > > > >>> requiring slave restart). > > > >> > > > >> Can't this just be any old Constraint (not named "dedicated"). In > > other > > > >> words, doesn't this code already deal with non-dedicated > constraints?: > > > >> > > > >> > > > > > > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 > > > >> > > > >> > > > >>> This could be quite useful for experimenting purposes (e.g. > different > > > >>> host OS) or to target a different hardware offering (e.g. GPUs). In > > > >>> other words, only those jobs that explicitly opt-in to participate > in > > > >>> an experiment or hw offering would be landing on that slave set. > > > >>> > > > >>> Thanks, > > > >>> Maxim > > > >>> > > > >>> [1]- > > > >> > > > > > > https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276 > > > >> > > > >> > > > >> > > > >> -- > > > >> John Sirois > > > >> 303-512-3301 > > > >> > > > > > > --001a113fcf8caf9df20529bc5670--