aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <ma...@apache.org>
Subject Re: Non-exclusive dedicated constraint
Date Wed, 20 Jan 2016 18:25:59 GMT
I should have looked closely, you are right! This indeed addresses
both cases: a job with a named dedicated role is still allowed to get
though if it's role matches the constraint and everything else
(non-exclusive dedicated pool) is addressed with "*".

What it does not solve though is the variety of non-exclusive
dedicated pools (e.g. GPU, OS, high network bandwidth and etc.). For
that we would need something similar to what Bill suggested.

On Wed, Jan 20, 2016 at 10:03 AM, Steve Niemitz <sniemitz@apache.org> wrote:
> An arbitrary job can't target a fully dedicated role with this patch, it
> will still get a "constraint not satisfied: dedicated" error.  The code in
> the scheduler that matches the constraints does a simple string match, so
> "*/test" will not match "role1/test" when trying to place the task, it will
> only match "*/test".
>
> On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <maxim@apache.org>
> wrote:
>
>> Thanks for the info, Steve! Yes, it would accomplish the same goal but
>> at the price of removing the exclusive dedicated constraint
>> enforcement. With this patch any job could target a fully dedicated
>> exclusive pool, which may be undesirable for dedicated pool owners.
>>
>>
>>
>> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sniemitz@apache.org>
>> wrote:
>> > We've been running a trivial patch [1] that does what I believe you're
>> > talking about for awhile now.  It allows a * for the role name, basically
>> > allowing any role to match the constraint, so our constraints look like
>> > "*/secure"
>> >
>> > Our use case is we have a "secure" cluster of machines that is
>> constrained
>> > on what can run on it (via an external audit process) that multiple roles
>> > run on.
>> >
>> > I believe I had talked to Bill about this a few months ago, but I don't
>> > remember where it ended up.
>> >
>> > [1]
>> >
>> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562
>> >
>> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <maxim@apache.org>
>> > wrote:
>> >
>> >> Oh, I didn't mean the memory GC pressure in the pure sense, rather a
>> >> logical garbage of orphaned hosts that never leave the scheduler. It's
>> >> not something to be concerned about from the performance standpoint.
>> >> It's, however, something operators need to be aware of when a host
>> >> from a dedicated pool gets dropped or replaced.
>> >>
>> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wfarner@apache.org>
>> wrote:
>> >> > What do you mean by GC burden?  What i'm proposing is effectively
>> >> > Map<String, String>.  Even with an extremely forgetful operator
(even
>> >> more
>> >> > than Joe!), it would require a huge oversight to put a dent in heap
>> >> usage.
>> >> > I'm sure there are ways we could even expose a useful stat to flag
>> such
>> >> an
>> >> > oversight.
>> >> >
>> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <maxim@apache.org>
>> >> wrote:
>> >> >
>> >> >> Right, that's what I thought. Yes, it sounds interesting. My only
>> >> >> concern is the GC burden of getting rid of hostnames that are
>> obsolete
>> >> >> and no longer exist. Relying on offers to update hostname 'relevance'
>> >> >> may not work as dedicated hosts may be fully packed and not release
>> >> >> any resources for a very long time. Let me explore this idea a
bit to
>> >> >> see what it would take to implement.
>> >> >>
>> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wfarner@apache.org>
>> >> wrote:
>> >> >> > Not a host->attribute mapping (attribute in the mesos sense,
>> anyway).
>> >> >> Rather
>> >> >> > an out-of-band API for marking machines as reserved.  For
>> task->offer
>> >> >> > mapping it's just a matter of another data source.  Does that
make
>> >> sense?
>> >> >> >
>> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <maxim@apache.org>
>> >> >> wrote:
>> >> >> >
>> >> >> >> >
>> >> >> >> > Can't this just be any old Constraint (not named
"dedicated").
>> In
>> >> >> other
>> >> >> >> > words, doesn't this code already deal with non-dedicated
>> >> constraints?:
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> >> >> >>
>> >> >> >>
>> >> >> >> Not really. There is a subtle difference here. A regular
>> >> (non-dedicated)
>> >> >> >> constraint does not prevent other tasks from landing on
a given
>> >> machine
>> >> >> set
>> >> >> >> whereas dedicated keeps other tasks away by only allowing
those
>> >> matching
>> >> >> >> the dedicated attribute. What this proposal targets is
allowing
>> >> >> exclusive
>> >> >> >> machine pool matching any job that has this new constraint
while
>> >> keeping
>> >> >> >> all other tasks that don't have that attribute away.
>> >> >> >>
>> >> >> >> Following an example from my original post, imagine a
GPU machine
>> >> pool.
>> >> >> Any
>> >> >> >> job (from any role) requiring GPU resource would be allowed
while
>> all
>> >> >> other
>> >> >> >> jobs that don't have that constraint would be vetoed.
>> >> >> >>
>> >> >> >> Also, regarding dedicated constraints necessitating a
slave
>> restart -
>> >> >> i've
>> >> >> >> > pondered moving dedicated machine management to the
scheduler
>> for
>> >> >> similar
>> >> >> >> > purposes.  There's not really much forcing that behavior
to be
>> >> managed
>> >> >> >> with
>> >> >> >> > a slave attribute.
>> >> >> >>
>> >> >> >>
>> >> >> >> Would you mind giving a few more hints on the mechanics
behind
>> this?
>> >> How
>> >> >> >> would scheduler know about dedicated hw without the slave
>> attributes
>> >> >> set?
>> >> >> >> Are you proposing storing hostname->attribute mapping
in the
>> >> scheduler
>> >> >> >> store?
>> >> >> >>
>> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfarner@apache.org
>> >> >> >> <javascript:;>> wrote:
>> >> >> >>
>> >> >> >> > Joe - if you want to pursue this, I suggest you start
another
>> >> thread
>> >> >> to
>> >> >> >> > keep this thread's discussion in tact.  I will not
be able to
>> lead
>> >> >> this
>> >> >> >> > change, but can certainly shepherd!
>> >> >> >> >
>> >> >> >> > On Tuesday, January 19, 2016, Joe Smith <yasumoto7@gmail.com
>> >> >> >> <javascript:;>> wrote:
>> >> >> >> >
>> >> >> >> > > As an operator, that'd be a relatively simple
change in
>> tooling,
>> >> and
>> >> >> >> the
>> >> >> >> > > benefits of not forcing a slave restart would
be _huge_.
>> >> >> >> > >
>> >> >> >> > > Keeping the dedicated semantics (but adding
non-exclusive)
>> would
>> >> be
>> >> >> >> ideal
>> >> >> >> > > if possible.
>> >> >> >> > >
>> >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner
<wfarner@apache.org
>> >> >> >> <javascript:;>
>> >> >> >> > > <javascript:;>> wrote:
>> >> >> >> > > >
>> >> >> >> > > > Also, regarding dedicated constraints necessitating
a slave
>> >> >> restart -
>> >> >> >> > > i've
>> >> >> >> > > > pondered moving dedicated machine management
to the
>> scheduler
>> >> for
>> >> >> >> > similar
>> >> >> >> > > > purposes.  There's not really much forcing
that behavior to
>> be
>> >> >> >> managed
>> >> >> >> > > with
>> >> >> >> > > > a slave attribute.
>> >> >> >> > > >
>> >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois
<
>> >> john@conductant.com
>> >> >> >> <javascript:;>
>> >> >> >> > > <javascript:;>> wrote:
>> >> >> >> > > >
>> >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim
Khutornenko <
>> >> >> >> maxim@apache.org <javascript:;>
>> >> >> >> > > <javascript:;>>
>> >> >> >> > > >> wrote:
>> >> >> >> > > >>
>> >> >> >> > > >>> Has anyone explored an idea of
having a non-exclusive (wrt
>> >> job
>> >> >> >> role)
>> >> >> >> > > >>> dedicated constraint in Aurora
before?
>> >> >> >> > > >>
>> >> >> >> > > >>
>> >> >> >> > > >>> We do have a dedicated constraint
now but it assumes a 1:1
>> >> >> >> > > >>> relationship between a job role
and a slave attribute [1].
>> >> For
>> >> >> >> > > >>> example: a 'www-data/prod/hello'
job with a dedicated
>> >> >> constraint of
>> >> >> >> > > >>> 'dedicated': 'www-data/hello' may
only be pinned to a
>> >> particular
>> >> >> >> set
>> >> >> >> > > >>> of slaves if all of them have 'www-data/hello'
attribute
>> >> set. No
>> >> >> >> > other
>> >> >> >> > > >>> role tasks will be able to land
on those slaves unless
>> their
>> >> >> >> > > >>> 'role/name' pair is added into
the slave attribute set.
>> >> >> >> > > >>>
>> >> >> >> > > >>> The above is very limiting as it
prevents carving out
>> subsets
>> >> >> of a
>> >> >> >> > > >>> shared pool cluster to be used
by multiple roles at the
>> same
>> >> >> time.
>> >> >> >> > > >>> Would it make sense to have a free-form
dedicated
>> constraint
>> >> not
>> >> >> >> > bound
>> >> >> >> > > >>> to a particular role? Multiple
jobs could then use this
>> type
>> >> of
>> >> >> >> > > >>> constraint dynamically without
modifying the slave command
>> >> line
>> >> >> >> (and
>> >> >> >> > > >>> requiring slave restart).
>> >> >> >> > > >>
>> >> >> >> > > >> Can't this just be any old Constraint
(not named
>> "dedicated").
>> >> >> In
>> >> >> >> > other
>> >> >> >> > > >> words, doesn't this code already deal
with non-dedicated
>> >> >> >> constraints?:
>> >> >> >> > > >>
>> >> >> >> > > >>
>> >> >> >> > >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> >> >> >> > > >>
>> >> >> >> > > >>
>> >> >> >> > > >>> This could be quite useful for
experimenting purposes
>> (e.g.
>> >> >> >> different
>> >> >> >> > > >>> host OS) or to target a different
hardware offering (e.g.
>> >> >> GPUs). In
>> >> >> >> > > >>> other words, only those jobs that
explicitly opt-in to
>> >> >> participate
>> >> >> >> in
>> >> >> >> > > >>> an experiment or hw offering would
be landing on that
>> slave
>> >> set.
>> >> >> >> > > >>>
>> >> >> >> > > >>> Thanks,
>> >> >> >> > > >>> Maxim
>> >> >> >> > > >>>
>> >> >> >> > > >>> [1]-
>> >> >> >> > > >>
>> >> >> >> > >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
>> >> >> >> > > >>
>> >> >> >> > > >>
>> >> >> >> > > >>
>> >> >> >> > > >> --
>> >> >> >> > > >> John Sirois
>> >> >> >> > > >> 303-512-3301
>> >> >> >> > > >>
>> >> >> >> > >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>>

Mime
View raw message