aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Wickman <wick...@apache.org>
Subject Re: aurora watch_secs change
Date Tue, 02 Dec 2014 22:44:44 GMT
Refactoring the executor to allow (B) is also probably easiest, but per
Farner's caveat -- it should only send framework messages on status
transitions.

There is something called the StatusManager that takes in input from any
status checker (such as the health checker, announcer, resource manager,
etc.)  Right now you just hand it a callback that it calls when it thinks
something has gone awry, currently a shutdown method.  This could be
changed to just send a framework message instead and let the scheduler deal
with it.  (As such it would also be trivial to toggle this behavior via the
command line, too.)

On Tue, Dec 2, 2014 at 2:26 PM, Bill Farner <wfarner@apache.org> wrote:

> Also relevant to this is AURORA-279, which suggests that we may not want to
> special-case the startup phase.
>
> Additional context - we should lean towards using mesos' framework messages
> as the communication medium.  These messages are one-way, rather than
> request-response based.  This seems to rule out or at least complicate (D).
>
> (B) actually sounds interesting to me.  The executor could start notifying
> the scheduler of health check results, triggered by an edge (unhealthy ->
> healthy, vice versa).
>
> -=Bill
>
> On Tue, Dec 2, 2014 at 1:53 PM, Nakamura <nnythm@gmail.com> wrote:
>
> > Howdy,
> >
> > I'm interested in tackling AURORA-894, but I'm not terribly familiar with
> > aurora, so I'd like some feedback on my design before I go forth.
> >
> > Bill pointed out that the hard bit would be designing the algorithm so it
> > doesn't DDoS the scheduler, and I think I have an idea of the possible
> > design space.  I wanted to know what you thought.
> >
> > A.  sample the number of health checks, and send them back to the
> > scheduler.  this is pretty simple, but 99% of the time will be total
> noise,
> > since the data isn't generally useful.
> >
> > B.  the executor sends health checks until it receives an out of band
> > request from the scheduler not to.  this seems fragile (I'm imagining
> > mismatched executors/schedulers behaving poorly) but would also probably
> be
> > reasonably simple.
> >
> > C.  a slightly more sophisticated approach might be to tell the executor
> > how many health checks to look for, so that it could send a status update
> > back, since status updates have reliable delivery.
> >
> > D. when the scheduler has finished standing up the executor, it
> long-polls,
> > which also takes care of reliable delivery because it's presumably over
> TCP
> > and we have total control (not having to go through mesos).
> >
> > I'm hesitant to do A, because it's so wasteful.  B sounds fragile, so I
> > don't want to do that one.  D requires long-polling, which your client
> may
> > or may not do well.  I'm leaning toward C.  Do you think that sounds
> like a
> > reasonable approach?
> >
> > Thanks,
> > Moses
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message