aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: Non-exclusive dedicated constraint
Date Wed, 09 Mar 2016 22:41:38 GMT
Ah, so it only practically makes sense when the dedicated attribute is
*/something, but * would not make much sense.  Seems reasonable to me.

On Wed, Mar 9, 2016 at 2:32 PM, Maxim Khutornenko <maxim@apache.org> wrote:

> It's an *easy* way to get a virtual cluster with specific
> requirements. One example: have a set of machines in a shared pool
> with a different OS. This would let any existing or new customers try
> their services for compliance. The alternative would be spinning off a
> completely new physical cluster, which is a huge overhead on both
> supply and demand sides.
>
> On Wed, Mar 9, 2016 at 2:26 PM, Bill Farner <wfarner@apache.org> wrote:
> > What does it mean to have a 'dedicated' host that's free-for-all like
> that?
> >
> > On Wed, Mar 9, 2016 at 2:16 PM, Maxim Khutornenko <maxim@apache.org>
> wrote:
> >
> >> Reactivating this thread. I like Bill's suggestion to have scheduler
> >> dedicated constraint management system. It will, however, require a
> >> substantial effort to get done properly. Would anyone oppose adopting
> >> Steve's patch in the meantime? The ROI is so high it would be a crime
> >> NOT to take it :)
> >>
> >> On Wed, Jan 20, 2016 at 10:25 AM, Maxim Khutornenko <maxim@apache.org>
> >> wrote:
> >> > I should have looked closely, you are right! This indeed addresses
> >> > both cases: a job with a named dedicated role is still allowed to get
> >> > though if it's role matches the constraint and everything else
> >> > (non-exclusive dedicated pool) is addressed with "*".
> >> >
> >> > What it does not solve though is the variety of non-exclusive
> >> > dedicated pools (e.g. GPU, OS, high network bandwidth and etc.). For
> >> > that we would need something similar to what Bill suggested.
> >> >
> >> > On Wed, Jan 20, 2016 at 10:03 AM, Steve Niemitz <sniemitz@apache.org>
> >> wrote:
> >> >> An arbitrary job can't target a fully dedicated role with this
> patch, it
> >> >> will still get a "constraint not satisfied: dedicated" error.  The
> code
> >> in
> >> >> the scheduler that matches the constraints does a simple string
> match,
> >> so
> >> >> "*/test" will not match "role1/test" when trying to place the task,
> it
> >> will
> >> >> only match "*/test".
> >> >>
> >> >> On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <
> maxim@apache.org>
> >> >> wrote:
> >> >>
> >> >>> Thanks for the info, Steve! Yes, it would accomplish the same goal
> but
> >> >>> at the price of removing the exclusive dedicated constraint
> >> >>> enforcement. With this patch any job could target a fully dedicated
> >> >>> exclusive pool, which may be undesirable for dedicated pool owners.
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sniemitz@apache.org
> >
> >> >>> wrote:
> >> >>> > We've been running a trivial patch [1] that does what I believe
> >> you're
> >> >>> > talking about for awhile now.  It allows a * for the role
name,
> >> basically
> >> >>> > allowing any role to match the constraint, so our constraints
look
> >> like
> >> >>> > "*/secure"
> >> >>> >
> >> >>> > Our use case is we have a "secure" cluster of machines that
is
> >> >>> constrained
> >> >>> > on what can run on it (via an external audit process) that
> multiple
> >> roles
> >> >>> > run on.
> >> >>> >
> >> >>> > I believe I had talked to Bill about this a few months ago,
but I
> >> don't
> >> >>> > remember where it ended up.
> >> >>> >
> >> >>> > [1]
> >> >>> >
> >> >>>
> >>
> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562
> >> >>> >
> >> >>> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <
> >> maxim@apache.org>
> >> >>> > wrote:
> >> >>> >
> >> >>> >> Oh, I didn't mean the memory GC pressure in the pure sense,
> rather a
> >> >>> >> logical garbage of orphaned hosts that never leave the
scheduler.
> >> It's
> >> >>> >> not something to be concerned about from the performance
> standpoint.
> >> >>> >> It's, however, something operators need to be aware of
when a
> host
> >> >>> >> from a dedicated pool gets dropped or replaced.
> >> >>> >>
> >> >>> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wfarner@apache.org
> >
> >> >>> wrote:
> >> >>> >> > What do you mean by GC burden?  What i'm proposing
is
> effectively
> >> >>> >> > Map<String, String>.  Even with an extremely
forgetful operator
> >> (even
> >> >>> >> more
> >> >>> >> > than Joe!), it would require a huge oversight to
put a dent in
> >> heap
> >> >>> >> usage.
> >> >>> >> > I'm sure there are ways we could even expose a useful
stat to
> flag
> >> >>> such
> >> >>> >> an
> >> >>> >> > oversight.
> >> >>> >> >
> >> >>> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko
<
> >> maxim@apache.org>
> >> >>> >> wrote:
> >> >>> >> >
> >> >>> >> >> Right, that's what I thought. Yes, it sounds
interesting. My
> only
> >> >>> >> >> concern is the GC burden of getting rid of hostnames
that are
> >> >>> obsolete
> >> >>> >> >> and no longer exist. Relying on offers to update
hostname
> >> 'relevance'
> >> >>> >> >> may not work as dedicated hosts may be fully
packed and not
> >> release
> >> >>> >> >> any resources for a very long time. Let me explore
this idea a
> >> bit to
> >> >>> >> >> see what it would take to implement.
> >> >>> >> >>
> >> >>> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner
<
> wfarner@apache.org
> >> >
> >> >>> >> wrote:
> >> >>> >> >> > Not a host->attribute mapping (attribute
in the mesos sense,
> >> >>> anyway).
> >> >>> >> >> Rather
> >> >>> >> >> > an out-of-band API for marking machines
as reserved.  For
> >> >>> task->offer
> >> >>> >> >> > mapping it's just a matter of another data
source.  Does
> that
> >> make
> >> >>> >> sense?
> >> >>> >> >> >
> >> >>> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko
<
> >> maxim@apache.org>
> >> >>> >> >> wrote:
> >> >>> >> >> >
> >> >>> >> >> >> >
> >> >>> >> >> >> > Can't this just be any old Constraint
(not named
> >> "dedicated").
> >> >>> In
> >> >>> >> >> other
> >> >>> >> >> >> > words, doesn't this code already
deal with non-dedicated
> >> >>> >> constraints?:
> >> >>> >> >> >> >
> >> >>> >> >> >> >
> >> >>> >> >> >>
> >> >>> >> >>
> >> >>> >>
> >> >>>
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >> >>> >> >> >>
> >> >>> >> >> >>
> >> >>> >> >> >> Not really. There is a subtle difference
here. A regular
> >> >>> >> (non-dedicated)
> >> >>> >> >> >> constraint does not prevent other tasks
from landing on a
> >> given
> >> >>> >> machine
> >> >>> >> >> set
> >> >>> >> >> >> whereas dedicated keeps other tasks
away by only allowing
> >> those
> >> >>> >> matching
> >> >>> >> >> >> the dedicated attribute. What this proposal
targets is
> >> allowing
> >> >>> >> >> exclusive
> >> >>> >> >> >> machine pool matching any job that has
this new constraint
> >> while
> >> >>> >> keeping
> >> >>> >> >> >> all other tasks that don't have that
attribute away.
> >> >>> >> >> >>
> >> >>> >> >> >> Following an example from my original
post, imagine a GPU
> >> machine
> >> >>> >> pool.
> >> >>> >> >> Any
> >> >>> >> >> >> job (from any role) requiring GPU resource
would be allowed
> >> while
> >> >>> all
> >> >>> >> >> other
> >> >>> >> >> >> jobs that don't have that constraint
would be vetoed.
> >> >>> >> >> >>
> >> >>> >> >> >> Also, regarding dedicated constraints
necessitating a slave
> >> >>> restart -
> >> >>> >> >> i've
> >> >>> >> >> >> > pondered moving dedicated machine
management to the
> >> scheduler
> >> >>> for
> >> >>> >> >> similar
> >> >>> >> >> >> > purposes.  There's not really much
forcing that behavior
> to
> >> be
> >> >>> >> managed
> >> >>> >> >> >> with
> >> >>> >> >> >> > a slave attribute.
> >> >>> >> >> >>
> >> >>> >> >> >>
> >> >>> >> >> >> Would you mind giving a few more hints
on the mechanics
> behind
> >> >>> this?
> >> >>> >> How
> >> >>> >> >> >> would scheduler know about dedicated
hw without the slave
> >> >>> attributes
> >> >>> >> >> set?
> >> >>> >> >> >> Are you proposing storing hostname->attribute
mapping in
> the
> >> >>> >> scheduler
> >> >>> >> >> >> store?
> >> >>> >> >> >>
> >> >>> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill
Farner <
> >> wfarner@apache.org
> >> >>> >> >> >> <javascript:;>> wrote:
> >> >>> >> >> >>
> >> >>> >> >> >> > Joe - if you want to pursue this,
I suggest you start
> >> another
> >> >>> >> thread
> >> >>> >> >> to
> >> >>> >> >> >> > keep this thread's discussion in
tact.  I will not be
> able
> >> to
> >> >>> lead
> >> >>> >> >> this
> >> >>> >> >> >> > change, but can certainly shepherd!
> >> >>> >> >> >> >
> >> >>> >> >> >> > On Tuesday, January 19, 2016, Joe
Smith <
> >> yasumoto7@gmail.com
> >> >>> >> >> >> <javascript:;>> wrote:
> >> >>> >> >> >> >
> >> >>> >> >> >> > > As an operator, that'd be
a relatively simple change in
> >> >>> tooling,
> >> >>> >> and
> >> >>> >> >> >> the
> >> >>> >> >> >> > > benefits of not forcing a
slave restart would be
> _huge_.
> >> >>> >> >> >> > >
> >> >>> >> >> >> > > Keeping the dedicated semantics
(but adding
> non-exclusive)
> >> >>> would
> >> >>> >> be
> >> >>> >> >> >> ideal
> >> >>> >> >> >> > > if possible.
> >> >>> >> >> >> > >
> >> >>> >> >> >> > > > On Jan 19, 2016, at 19:09,
Bill Farner <
> >> wfarner@apache.org
> >> >>> >> >> >> <javascript:;>
> >> >>> >> >> >> > > <javascript:;>> wrote:
> >> >>> >> >> >> > > >
> >> >>> >> >> >> > > > Also, regarding dedicated
constraints necessitating a
> >> slave
> >> >>> >> >> restart -
> >> >>> >> >> >> > > i've
> >> >>> >> >> >> > > > pondered moving dedicated
machine management to the
> >> >>> scheduler
> >> >>> >> for
> >> >>> >> >> >> > similar
> >> >>> >> >> >> > > > purposes.  There's not
really much forcing that
> >> behavior to
> >> >>> be
> >> >>> >> >> >> managed
> >> >>> >> >> >> > > with
> >> >>> >> >> >> > > > a slave attribute.
> >> >>> >> >> >> > > >
> >> >>> >> >> >> > > > On Tue, Jan 19, 2016
at 7:05 PM, John Sirois <
> >> >>> >> john@conductant.com
> >> >>> >> >> >> <javascript:;>
> >> >>> >> >> >> > > <javascript:;>> wrote:
> >> >>> >> >> >> > > >
> >> >>> >> >> >> > > >> On Tue, Jan 19, 2016
at 7:22 PM, Maxim Khutornenko <
> >> >>> >> >> >> maxim@apache.org <javascript:;>
> >> >>> >> >> >> > > <javascript:;>>
> >> >>> >> >> >> > > >> wrote:
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>> Has anyone explored
an idea of having a
> non-exclusive
> >> (wrt
> >> >>> >> job
> >> >>> >> >> >> role)
> >> >>> >> >> >> > > >>> dedicated constraint
in Aurora before?
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>> We do have a
dedicated constraint now but it
> assumes
> >> a 1:1
> >> >>> >> >> >> > > >>> relationship
between a job role and a slave
> attribute
> >> [1].
> >> >>> >> For
> >> >>> >> >> >> > > >>> example: a 'www-data/prod/hello'
job with a
> dedicated
> >> >>> >> >> constraint of
> >> >>> >> >> >> > > >>> 'dedicated':
'www-data/hello' may only be pinned
> to a
> >> >>> >> particular
> >> >>> >> >> >> set
> >> >>> >> >> >> > > >>> of slaves if
all of them have 'www-data/hello'
> >> attribute
> >> >>> >> set. No
> >> >>> >> >> >> > other
> >> >>> >> >> >> > > >>> role tasks will
be able to land on those slaves
> unless
> >> >>> their
> >> >>> >> >> >> > > >>> 'role/name' pair
is added into the slave attribute
> >> set.
> >> >>> >> >> >> > > >>>
> >> >>> >> >> >> > > >>> The above is
very limiting as it prevents carving
> out
> >> >>> subsets
> >> >>> >> >> of a
> >> >>> >> >> >> > > >>> shared pool cluster
to be used by multiple roles at
> >> the
> >> >>> same
> >> >>> >> >> time.
> >> >>> >> >> >> > > >>> Would it make
sense to have a free-form dedicated
> >> >>> constraint
> >> >>> >> not
> >> >>> >> >> >> > bound
> >> >>> >> >> >> > > >>> to a particular
role? Multiple jobs could then use
> >> this
> >> >>> type
> >> >>> >> of
> >> >>> >> >> >> > > >>> constraint dynamically
without modifying the slave
> >> command
> >> >>> >> line
> >> >>> >> >> >> (and
> >> >>> >> >> >> > > >>> requiring slave
restart).
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >> Can't this just be
any old Constraint (not named
> >> >>> "dedicated").
> >> >>> >> >> In
> >> >>> >> >> >> > other
> >> >>> >> >> >> > > >> words, doesn't this
code already deal with
> >> non-dedicated
> >> >>> >> >> >> constraints?:
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > >
> >> >>> >> >> >> >
> >> >>> >> >> >>
> >> >>> >> >>
> >> >>> >>
> >> >>>
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>> This could be
quite useful for experimenting
> purposes
> >> >>> (e.g.
> >> >>> >> >> >> different
> >> >>> >> >> >> > > >>> host OS) or to
target a different hardware offering
> >> (e.g.
> >> >>> >> >> GPUs). In
> >> >>> >> >> >> > > >>> other words,
only those jobs that explicitly
> opt-in to
> >> >>> >> >> participate
> >> >>> >> >> >> in
> >> >>> >> >> >> > > >>> an experiment
or hw offering would be landing on
> that
> >> >>> slave
> >> >>> >> set.
> >> >>> >> >> >> > > >>>
> >> >>> >> >> >> > > >>> Thanks,
> >> >>> >> >> >> > > >>> Maxim
> >> >>> >> >> >> > > >>>
> >> >>> >> >> >> > > >>> [1]-
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > >
> >> >>> >> >> >> >
> >> >>> >> >> >>
> >> >>> >> >>
> >> >>> >>
> >> >>>
> >>
> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >> --
> >> >>> >> >> >> > > >> John Sirois
> >> >>> >> >> >> > > >> 303-512-3301
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > >
> >> >>> >> >> >> >
> >> >>> >> >> >>
> >> >>> >> >>
> >> >>> >>
> >> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message