mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <>
Subject [jira] [Commented] (MESOS-4306) AGENT_DEAD Message
Date Thu, 07 Jan 2016 19:55:39 GMT


Vinod Kone commented on MESOS-4306:


> AGENT_DEAD Message
> ------------------
>                 Key: MESOS-4306
>                 URL:
>             Project: Mesos
>          Issue Type: Task
>            Reporter: Gabriel Hartmann
> Frameworks currently receive SLAVE_LOST messages when an Agent fails or is behind a network
partition for some period of time.  However frameworks and indeed Mesos cannot differentiate
between an Agent being temporarily or permanently lost.
> It would be good to have a message indicating that an Agent is lost and won't be returning.
 This would require human intervention so an endpoint should be exposed to induce the sending
of this message.
> This is particularly helpful for frameworks which are waiting for the return of persistent
volumes.  In the case where an Agent hosting significant data (multi terabyte) the framework
may be willing to wait a significant amount of time before repairing its replication factor
(for example).  Explicit human provided information about the permanent state of Agents and
therefore their resources would allow these kinds of frameworks to accelerate their recovery

This message was sent by Atlassian JIRA

View raw message