mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yong Qiao Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-3324) Resource leak issue in Mesos
Date Mon, 31 Aug 2015 05:43:45 GMT

    [ https://issues.apache.org/jira/browse/MESOS-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722951#comment-14722951
] 

Yong Qiao Wang commented on MESOS-3324:
---------------------------------------

My proposal to address this resource leak issue:
1. Add a timeout (for example, --framework_reregister_timeout) for framework reregister;
2. Add a new libprocess object to manage those orphaned tasks or executors, it will 
    - Clean up the orphaned tasks or executors after --framework_reregister_timeout when Mesos
master restart;
    - Run to clean up  the orphaned tasks or executors (those orphaned object have lasted
for a framework_reregister_timeout) when Mesos master running;

> Resource leak issue in Mesos
> ----------------------------
>
>                 Key: MESOS-3324
>                 URL: https://issues.apache.org/jira/browse/MESOS-3324
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yong Qiao Wang
>            Assignee: Yong Qiao Wang
>            Priority: Critical
>
> In Mesos master recovery case, if one framework is exit during Mesos master downtime
and this framework has already launched some long running tasks before Mesos master down.
Then after Mesos master recovery, those long running tasks will always running as the orphaned
tasks in Mesos cluster, no any other components can kill those tasks later. This should be
a resource leak issue in Mesos, I propose to add a timeout to kill those orphaned tasks or
executors in Mesos master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message