aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Niemitz <sniem...@apache.org>
Subject Re: Non-exclusive dedicated constraint
Date Wed, 20 Jan 2016 18:03:45 GMT
An arbitrary job can't target a fully dedicated role with this patch, it
will still get a "constraint not satisfied: dedicated" error.  The code in
the scheduler that matches the constraints does a simple string match, so
"*/test" will not match "role1/test" when trying to place the task, it will
only match "*/test".

On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <maxim@apache.org>
wrote:

> Thanks for the info, Steve! Yes, it would accomplish the same goal but
> at the price of removing the exclusive dedicated constraint
> enforcement. With this patch any job could target a fully dedicated
> exclusive pool, which may be undesirable for dedicated pool owners.
>
>
>
> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sniemitz@apache.org>
> wrote:
> > We've been running a trivial patch [1] that does what I believe you're
> > talking about for awhile now.  It allows a * for the role name, basically
> > allowing any role to match the constraint, so our constraints look like
> > "*/secure"
> >
> > Our use case is we have a "secure" cluster of machines that is
> constrained
> > on what can run on it (via an external audit process) that multiple roles
> > run on.
> >
> > I believe I had talked to Bill about this a few months ago, but I don't
> > remember where it ended up.
> >
> > [1]
> >
> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562
> >
> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <maxim@apache.org>
> > wrote:
> >
> >> Oh, I didn't mean the memory GC pressure in the pure sense, rather a
> >> logical garbage of orphaned hosts that never leave the scheduler. It's
> >> not something to be concerned about from the performance standpoint.
> >> It's, however, something operators need to be aware of when a host
> >> from a dedicated pool gets dropped or replaced.
> >>
> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wfarner@apache.org>
> wrote:
> >> > What do you mean by GC burden?  What i'm proposing is effectively
> >> > Map<String, String>.  Even with an extremely forgetful operator (even
> >> more
> >> > than Joe!), it would require a huge oversight to put a dent in heap
> >> usage.
> >> > I'm sure there are ways we could even expose a useful stat to flag
> such
> >> an
> >> > oversight.
> >> >
> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <maxim@apache.org>
> >> wrote:
> >> >
> >> >> Right, that's what I thought. Yes, it sounds interesting. My only
> >> >> concern is the GC burden of getting rid of hostnames that are
> obsolete
> >> >> and no longer exist. Relying on offers to update hostname 'relevance'
> >> >> may not work as dedicated hosts may be fully packed and not release
> >> >> any resources for a very long time. Let me explore this idea a bit
to
> >> >> see what it would take to implement.
> >> >>
> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wfarner@apache.org>
> >> wrote:
> >> >> > Not a host->attribute mapping (attribute in the mesos sense,
> anyway).
> >> >> Rather
> >> >> > an out-of-band API for marking machines as reserved.  For
> task->offer
> >> >> > mapping it's just a matter of another data source.  Does that
make
> >> sense?
> >> >> >
> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <maxim@apache.org>
> >> >> wrote:
> >> >> >
> >> >> >> >
> >> >> >> > Can't this just be any old Constraint (not named "dedicated").
> In
> >> >> other
> >> >> >> > words, doesn't this code already deal with non-dedicated
> >> constraints?:
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >> >> >>
> >> >> >>
> >> >> >> Not really. There is a subtle difference here. A regular
> >> (non-dedicated)
> >> >> >> constraint does not prevent other tasks from landing on a
given
> >> machine
> >> >> set
> >> >> >> whereas dedicated keeps other tasks away by only allowing
those
> >> matching
> >> >> >> the dedicated attribute. What this proposal targets is allowing
> >> >> exclusive
> >> >> >> machine pool matching any job that has this new constraint
while
> >> keeping
> >> >> >> all other tasks that don't have that attribute away.
> >> >> >>
> >> >> >> Following an example from my original post, imagine a GPU
machine
> >> pool.
> >> >> Any
> >> >> >> job (from any role) requiring GPU resource would be allowed
while
> all
> >> >> other
> >> >> >> jobs that don't have that constraint would be vetoed.
> >> >> >>
> >> >> >> Also, regarding dedicated constraints necessitating a slave
> restart -
> >> >> i've
> >> >> >> > pondered moving dedicated machine management to the scheduler
> for
> >> >> similar
> >> >> >> > purposes.  There's not really much forcing that behavior
to be
> >> managed
> >> >> >> with
> >> >> >> > a slave attribute.
> >> >> >>
> >> >> >>
> >> >> >> Would you mind giving a few more hints on the mechanics behind
> this?
> >> How
> >> >> >> would scheduler know about dedicated hw without the slave
> attributes
> >> >> set?
> >> >> >> Are you proposing storing hostname->attribute mapping in
the
> >> scheduler
> >> >> >> store?
> >> >> >>
> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfarner@apache.org
> >> >> >> <javascript:;>> wrote:
> >> >> >>
> >> >> >> > Joe - if you want to pursue this, I suggest you start
another
> >> thread
> >> >> to
> >> >> >> > keep this thread's discussion in tact.  I will not be
able to
> lead
> >> >> this
> >> >> >> > change, but can certainly shepherd!
> >> >> >> >
> >> >> >> > On Tuesday, January 19, 2016, Joe Smith <yasumoto7@gmail.com
> >> >> >> <javascript:;>> wrote:
> >> >> >> >
> >> >> >> > > As an operator, that'd be a relatively simple change
in
> tooling,
> >> and
> >> >> >> the
> >> >> >> > > benefits of not forcing a slave restart would be
_huge_.
> >> >> >> > >
> >> >> >> > > Keeping the dedicated semantics (but adding non-exclusive)
> would
> >> be
> >> >> >> ideal
> >> >> >> > > if possible.
> >> >> >> > >
> >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfarner@apache.org
> >> >> >> <javascript:;>
> >> >> >> > > <javascript:;>> wrote:
> >> >> >> > > >
> >> >> >> > > > Also, regarding dedicated constraints necessitating
a slave
> >> >> restart -
> >> >> >> > > i've
> >> >> >> > > > pondered moving dedicated machine management
to the
> scheduler
> >> for
> >> >> >> > similar
> >> >> >> > > > purposes.  There's not really much forcing
that behavior to
> be
> >> >> >> managed
> >> >> >> > > with
> >> >> >> > > > a slave attribute.
> >> >> >> > > >
> >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois
<
> >> john@conductant.com
> >> >> >> <javascript:;>
> >> >> >> > > <javascript:;>> wrote:
> >> >> >> > > >
> >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim
Khutornenko <
> >> >> >> maxim@apache.org <javascript:;>
> >> >> >> > > <javascript:;>>
> >> >> >> > > >> wrote:
> >> >> >> > > >>
> >> >> >> > > >>> Has anyone explored an idea of having
a non-exclusive (wrt
> >> job
> >> >> >> role)
> >> >> >> > > >>> dedicated constraint in Aurora before?
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > > >>> We do have a dedicated constraint now
but it assumes a 1:1
> >> >> >> > > >>> relationship between a job role and
a slave attribute [1].
> >> For
> >> >> >> > > >>> example: a 'www-data/prod/hello' job
with a dedicated
> >> >> constraint of
> >> >> >> > > >>> 'dedicated': 'www-data/hello' may only
be pinned to a
> >> particular
> >> >> >> set
> >> >> >> > > >>> of slaves if all of them have 'www-data/hello'
attribute
> >> set. No
> >> >> >> > other
> >> >> >> > > >>> role tasks will be able to land on
those slaves unless
> their
> >> >> >> > > >>> 'role/name' pair is added into the
slave attribute set.
> >> >> >> > > >>>
> >> >> >> > > >>> The above is very limiting as it prevents
carving out
> subsets
> >> >> of a
> >> >> >> > > >>> shared pool cluster to be used by multiple
roles at the
> same
> >> >> time.
> >> >> >> > > >>> Would it make sense to have a free-form
dedicated
> constraint
> >> not
> >> >> >> > bound
> >> >> >> > > >>> to a particular role? Multiple jobs
could then use this
> type
> >> of
> >> >> >> > > >>> constraint dynamically without modifying
the slave command
> >> line
> >> >> >> (and
> >> >> >> > > >>> requiring slave restart).
> >> >> >> > > >>
> >> >> >> > > >> Can't this just be any old Constraint (not
named
> "dedicated").
> >> >> In
> >> >> >> > other
> >> >> >> > > >> words, doesn't this code already deal with
non-dedicated
> >> >> >> constraints?:
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > > >>> This could be quite useful for experimenting
purposes
> (e.g.
> >> >> >> different
> >> >> >> > > >>> host OS) or to target a different hardware
offering (e.g.
> >> >> GPUs). In
> >> >> >> > > >>> other words, only those jobs that explicitly
opt-in to
> >> >> participate
> >> >> >> in
> >> >> >> > > >>> an experiment or hw offering would
be landing on that
> slave
> >> set.
> >> >> >> > > >>>
> >> >> >> > > >>> Thanks,
> >> >> >> > > >>> Maxim
> >> >> >> > > >>>
> >> >> >> > > >>> [1]-
> >> >> >> > > >>
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > > >> --
> >> >> >> > > >> John Sirois
> >> >> >> > > >> 303-512-3301
> >> >> >> > > >>
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message