mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-7215) Master sends ShutdownFrameworkMessage for all non-partition-aware frameworks
Date Mon, 06 Mar 2017 23:18:33 GMT

    [ https://issues.apache.org/jira/browse/MESOS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898358#comment-15898358
] 

Vinod Kone commented on MESOS-7215:
-----------------------------------

Interesting.

I guess we never explicitly called out that `ShutdownFrameworkMessage` should only be sent
when framework is being torn down. But I'm surprised to hear that as a consequence of the
recent changes the task stays in STAGING forever. I'm assuming this is because agent doesn't
send a TASK_DROPPED status update since it thinks the framework is shutting down.

Sending a `KillTaskMessage` instead of `ShutdownFrameworkMessage` sounds good to me.

> Master sends ShutdownFrameworkMessage for all non-partition-aware frameworks
> ----------------------------------------------------------------------------
>
>                 Key: MESOS-7215
>                 URL: https://issues.apache.org/jira/browse/MESOS-7215
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yan Xu
>
> Prior to the partition-awareness work MESOS-5344, upon agent reregistration after it
has been removed, the master only sends ShutdownFrameworkMessages to the agent for frameworks
that it knows have been torn down. 
> With the new logic in MESOS-5344, Mesos is now sending {{ShutdownFrameworkMessages}}
to the agent for all non-partition-aware frameworks (including the ones that are still registered)
> This is problematic. The offer from this agent can still go to the same framework which
can then launch new tasks. The agent then receives tasks of the same framework and ignores
them because it thinks the framework is shutting down. The framework is not shutting down
of course, so from the master and the scheduler's perspective the task is pending in STAGING
forever until the next agent reregistration, which could happen much later.
> This also makes the semantics of `ShutdownFrameworkMessage` ambiguous: the agent is assuming
the framework to be going away (and act accordingly) when it's not. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message