mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kone <vi...@mesosphere.io>
Subject Re: Rate-limiting agent removal w/ PARTITION_AWARE
Date Mon, 01 Aug 2016 19:22:06 GMT
Rate limiting task kills (or more specifically framework shutdowns on
agents) of non-partition aware frameworks sounds good to me. I would like
us to have as much backwards compatibility as possible here. We can update
the flags' help saying that this doesn't apply to partition aware
frameworks.

On Sat, Jul 30, 2016 at 1:58 AM, Neil Conway <neil.conway@gmail.com> wrote:

> Hi Ben,
>
> Thanks for the feedback! Seems like we're on the same page overall.
>
> On Thu, Jul 28, 2016 at 8:42 AM, Benjamin Mahler <bmahler@apache.org>
> wrote:
> > It seems to me that these particular flags are not applicable for
> > PARTITION_AWARE frameworks, since there is no removal occurring.
>
> FWIW, I've still been using the term "removal" in the PARTITION_AWARE
> branch to describe any situation in which a slave is removed from the
> set of registered agents in the registry: e.g., both when we mark a
> slave unreachable (move from "admitted" to "unreachable" list in the
> registry) and when a slave gracefully disconnects via
> UnregisterSlaveMessage (remove from "admitted" list).
>
> > If we want to support schedulers that react poorly, we can add
> > per-framework rate limits for unreachable notifications. Operators could
> > turn these on to deal with specific frameworks that react poorly.
>
> Sounds reasonable. I'm inclined to not implement this until we have
> some evidence that people actually need it.
>
> > In situations where the
> > agent is considered unreachable, we won't offer resources, correct?
>
> Correct.
>
> Thanks,
> Neil
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message