mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Ignazio <>
Subject Re: Understanding Slave Recovery Timeouts
Date Fri, 19 Jun 2015 23:18:16 GMT
On Fri, Jun 19, 2015 at 3:46 PM, Vinod Kone <> wrote:

> *If* the 75 seconds is exceeded but we're within the recovery_timeout,
>> the slave *should* register with a new slave ID. The slave daemon (with
>> the new slave ID) reconnects to the old executors and updates them to use
>> the new slave ID.
> This is not true. 'recovery_timeout' was added to make sure that if a
> slave is down for a long time (>10 mins), the executors commit suicide. It
> is better for the executor/task to die than keep running because the
> framework might have already launched another replica of that instance.
> This was not tied to the 75s timeout (hard coded) because it is possible
> for a slave to successfully re-register with a master after 75s (e.g., both
> master and slave are down for 5 min).
> Also, a slave cannot connect to old executors with a new slave id.

Perfect, thanks for the quick response Vinod!

View raw message