Return-Path: X-Original-To: apmail-aurora-dev-archive@minotaur.apache.org Delivered-To: apmail-aurora-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 158BD18DB9 for ; Wed, 20 Jan 2016 04:31:45 +0000 (UTC) Received: (qmail 49009 invoked by uid 500); 20 Jan 2016 04:31:45 -0000 Delivered-To: apmail-aurora-dev-archive@aurora.apache.org Received: (qmail 48959 invoked by uid 500); 20 Jan 2016 04:31:44 -0000 Mailing-List: contact dev-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list dev@aurora.apache.org Received: (qmail 48948 invoked by uid 99); 20 Jan 2016 04:31:44 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2016 04:31:44 +0000 Received: from mail-ig0-f176.google.com (mail-ig0-f176.google.com [209.85.213.176]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 8E5391A0323 for ; Wed, 20 Jan 2016 04:31:44 +0000 (UTC) Received: by mail-ig0-f176.google.com with SMTP id h5so4936845igh.0 for ; Tue, 19 Jan 2016 20:31:44 -0800 (PST) X-Gm-Message-State: AG10YORYi9RRDqqQMgjg/DKhULyTWBrRJtGngUAnLm5Df31sxgDGqgB55P26gBOqNEGdSz+uJXGRzcQDemDw/Q== MIME-Version: 1.0 X-Received: by 10.50.142.7 with SMTP id rs7mr1712298igb.90.1453264303863; Tue, 19 Jan 2016 20:31:43 -0800 (PST) Received: by 10.107.12.37 with HTTP; Tue, 19 Jan 2016 20:31:43 -0800 (PST) In-Reply-To: References: <854CEFF1-E22F-4C9A-9AD0-674CAB418311@gmail.com> Date: Tue, 19 Jan 2016 20:31:43 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Non-exclusive dedicated constraint From: Maxim Khutornenko To: dev@aurora.apache.org Content-Type: text/plain; charset=UTF-8 Right, that's what I thought. Yes, it sounds interesting. My only concern is the GC burden of getting rid of hostnames that are obsolete and no longer exist. Relying on offers to update hostname 'relevance' may not work as dedicated hosts may be fully packed and not release any resources for a very long time. Let me explore this idea a bit to see what it would take to implement. On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner wrote: > Not a host->attribute mapping (attribute in the mesos sense, anyway). Rather > an out-of-band API for marking machines as reserved. For task->offer > mapping it's just a matter of another data source. Does that make sense? > > On Tuesday, January 19, 2016, Maxim Khutornenko wrote: > >> > >> > Can't this just be any old Constraint (not named "dedicated"). In other >> > words, doesn't this code already deal with non-dedicated constraints?: >> > >> > >> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 >> >> >> Not really. There is a subtle difference here. A regular (non-dedicated) >> constraint does not prevent other tasks from landing on a given machine set >> whereas dedicated keeps other tasks away by only allowing those matching >> the dedicated attribute. What this proposal targets is allowing exclusive >> machine pool matching any job that has this new constraint while keeping >> all other tasks that don't have that attribute away. >> >> Following an example from my original post, imagine a GPU machine pool. Any >> job (from any role) requiring GPU resource would be allowed while all other >> jobs that don't have that constraint would be vetoed. >> >> Also, regarding dedicated constraints necessitating a slave restart - i've >> > pondered moving dedicated machine management to the scheduler for similar >> > purposes. There's not really much forcing that behavior to be managed >> with >> > a slave attribute. >> >> >> Would you mind giving a few more hints on the mechanics behind this? How >> would scheduler know about dedicated hw without the slave attributes set? >> Are you proposing storing hostname->attribute mapping in the scheduler >> store? >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner > > wrote: >> >> > Joe - if you want to pursue this, I suggest you start another thread to >> > keep this thread's discussion in tact. I will not be able to lead this >> > change, but can certainly shepherd! >> > >> > On Tuesday, January 19, 2016, Joe Smith > > wrote: >> > >> > > As an operator, that'd be a relatively simple change in tooling, and >> the >> > > benefits of not forcing a slave restart would be _huge_. >> > > >> > > Keeping the dedicated semantics (but adding non-exclusive) would be >> ideal >> > > if possible. >> > > >> > > > On Jan 19, 2016, at 19:09, Bill Farner > >> > > > wrote: >> > > > >> > > > Also, regarding dedicated constraints necessitating a slave restart - >> > > i've >> > > > pondered moving dedicated machine management to the scheduler for >> > similar >> > > > purposes. There's not really much forcing that behavior to be >> managed >> > > with >> > > > a slave attribute. >> > > > >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois > >> > > > wrote: >> > > > >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko < >> maxim@apache.org >> > > > >> > > >> wrote: >> > > >> >> > > >>> Has anyone explored an idea of having a non-exclusive (wrt job >> role) >> > > >>> dedicated constraint in Aurora before? >> > > >> >> > > >> >> > > >>> We do have a dedicated constraint now but it assumes a 1:1 >> > > >>> relationship between a job role and a slave attribute [1]. For >> > > >>> example: a 'www-data/prod/hello' job with a dedicated constraint of >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a particular >> set >> > > >>> of slaves if all of them have 'www-data/hello' attribute set. No >> > other >> > > >>> role tasks will be able to land on those slaves unless their >> > > >>> 'role/name' pair is added into the slave attribute set. >> > > >>> >> > > >>> The above is very limiting as it prevents carving out subsets of a >> > > >>> shared pool cluster to be used by multiple roles at the same time. >> > > >>> Would it make sense to have a free-form dedicated constraint not >> > bound >> > > >>> to a particular role? Multiple jobs could then use this type of >> > > >>> constraint dynamically without modifying the slave command line >> (and >> > > >>> requiring slave restart). >> > > >> >> > > >> Can't this just be any old Constraint (not named "dedicated"). In >> > other >> > > >> words, doesn't this code already deal with non-dedicated >> constraints?: >> > > >> >> > > >> >> > > >> > >> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197 >> > > >> >> > > >> >> > > >>> This could be quite useful for experimenting purposes (e.g. >> different >> > > >>> host OS) or to target a different hardware offering (e.g. GPUs). In >> > > >>> other words, only those jobs that explicitly opt-in to participate >> in >> > > >>> an experiment or hw offering would be landing on that slave set. >> > > >>> >> > > >>> Thanks, >> > > >>> Maxim >> > > >>> >> > > >>> [1]- >> > > >> >> > > >> > >> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276 >> > > >> >> > > >> >> > > >> >> > > >> -- >> > > >> John Sirois >> > > >> 303-512-3301 >> > > >> >> > > >> > >>