mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deshi Xiao (JIRA)" <>
Subject [jira] [Commented] (MESOS-3545) Investigate restoring tasks/executors after machine reboot.
Date Sat, 01 Apr 2017 07:29:41 GMT


Deshi Xiao commented on MESOS-3545:

any update? [~megha.sharma] [~xujyan]

> Investigate restoring tasks/executors after machine reboot.
> -----------------------------------------------------------
>                 Key: MESOS-3545
>                 URL:
>             Project: Mesos
>          Issue Type: Epic
>          Components: agent
>            Reporter: Benjamin Hindman
>            Assignee: Megha Sharma
> If a task/executor is restartable (see MESOS-3544) it might make sense to force an agent
to restart these tasks/executors _before_ after a machine reboot in the event that the machine
is network partitioned away from the master (or the master has failed) but we'd like to get
these services running again. Assuming the agent(s) running on the machine has not been disconnected
from the master for longer than the master's agent re-registration timeout the agent should
be able to re-register (i.e., after a network partition is resolved) without a problem. However,
in the same way that a framework would be interested in knowing that it's tasks/executors
were restarted we'd want to send something like a TASK_RESTARTED status update.

This message was sent by Atlassian JIRA

View raw message