mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (Jira)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-4659) Avoid leaving orphan task after framework failure + master failover
Date Mon, 17 Feb 2020 20:02:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038595#comment-17038595
] 

Vinod Kone commented on MESOS-4659:
-----------------------------------

I dont have the bandwidth right now, but happy to review the code if you work on a patch.
Please see instructions here: https://mesos.readthedocs.io/en/latest/submitting-a-patch/

> Avoid leaving orphan task after framework failure + master failover
> -------------------------------------------------------------------
>
>                 Key: MESOS-4659
>                 URL: https://issues.apache.org/jira/browse/MESOS-4659
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>            Reporter: Neil Conway
>            Priority: Major
>              Labels: failover, mesosphere
>
> If a framework becomes disconnected from the master, its tasks are killed after waiting
for {{failover_timeout}}.
> However, if a master failover occurs but a framework never reconnects to the new master,
we never kill any of the tasks associated with that framework. These tasks remain orphaned
and presumably would need to be manually removed by the operator. Similarly, if a framework
gets torn down or disconnects while it has running tasks on a partitioned agent, those tasks
are not shutdown when the agent reregisters.
> We should consider whether to kill such orphaned tasks automatically, likely after waiting
for some (framework-configurable?) timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message