mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Ignazio <ro...@puppetlabs.com>
Subject Re: Understanding Slave Recovery Timeouts
Date Fri, 19 Jun 2015 23:18:16 GMT
On Fri, Jun 19, 2015 at 3:46 PM, Vinod Kone <vinodkone@gmail.com> wrote:

>
> *If* the 75 seconds is exceeded but we're within the recovery_timeout,
>> the slave *should* register with a new slave ID. The slave daemon (with
>> the new slave ID) reconnects to the old executors and updates them to use
>> the new slave ID.
>>
>
> This is not true. 'recovery_timeout' was added to make sure that if a
> slave is down for a long time (>10 mins), the executors commit suicide. It
> is better for the executor/task to die than keep running because the
> framework might have already launched another replica of that instance.
> This was not tied to the 75s timeout (hard coded) because it is possible
> for a slave to successfully re-register with a master after 75s (e.g., both
> master and slave are down for 5 min).
>
> Also, a slave cannot connect to old executors with a new slave id.
>

Perfect, thanks for the quick response Vinod!

Mime
View raw message