aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nakamura <>
Subject aurora watch_secs change
Date Tue, 02 Dec 2014 21:53:24 GMT

I'm interested in tackling AURORA-894, but I'm not terribly familiar with
aurora, so I'd like some feedback on my design before I go forth.

Bill pointed out that the hard bit would be designing the algorithm so it
doesn't DDoS the scheduler, and I think I have an idea of the possible
design space.  I wanted to know what you thought.

A.  sample the number of health checks, and send them back to the
scheduler.  this is pretty simple, but 99% of the time will be total noise,
since the data isn't generally useful.

B.  the executor sends health checks until it receives an out of band
request from the scheduler not to.  this seems fragile (I'm imagining
mismatched executors/schedulers behaving poorly) but would also probably be
reasonably simple.

C.  a slightly more sophisticated approach might be to tell the executor
how many health checks to look for, so that it could send a status update
back, since status updates have reliable delivery.

D. when the scheduler has finished standing up the executor, it long-polls,
which also takes care of reliable delivery because it's presumably over TCP
and we have total control (not having to go through mesos).

I'm hesitant to do A, because it's so wasteful.  B sounds fragile, so I
don't want to do that one.  D requires long-polling, which your client may
or may not do well.  I'm leaning toward C.  Do you think that sounds like a
reasonable approach?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message