mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Peach <jor...@gmail.com>
Subject Re: MESOS-6233 Allow agents to re-register post a host reboot
Date Tue, 29 Nov 2016 16:48:07 GMT

> On Nov 28, 2016, at 6:09 PM, Yan Xu <xujyan@apple.com> wrote:
> 
> So one thing that was brought up during offline conversations was that if the host reboot
is associated with hardware change (e.g., a new memory stick):
> 
> 	• Currently: the agent would skip the recovery (and the chance of running into incompatible
agent info) and register as a new agent.
> 	• With the change: the agent could run into incompatible agent info due to resource
change and flap indefinitely until the operator intervenes.
> 
> To mitigate this and maintain the current behavior, we can have the agent remove `rm
-f <work_dir>/meta/slaves/latest` automatically upon recovery failure but only after
the host has rebooted. This way the agent can restart as a new agent without operator intervention.

> 
> Any thoughts?

I still think you need a mechanism for the master/agent to tell you whether it will honor
the restart policy. Without this, you have to lock the framework to a Mesos version.

An empty RestartPolicy is also problematic since it precludes using RestartPolicy in pods.
If you later want to restart a task inside a pod but not across agent restarts you would have
no way to express that.

J
Mime
View raw message