mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-8987) Master asks agent to shutdown upon auth errors
Date Tue, 12 Jun 2018 22:49:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510339#comment-16510339
] 

Jie Yu commented on MESOS-8987:
-------------------------------

Raise this to BLOCKER since it might kill all tasks in the cluster.

> Master asks agent to shutdown upon auth errors
> ----------------------------------------------
>
>                 Key: MESOS-8987
>                 URL: https://issues.apache.org/jira/browse/MESOS-8987
>             Project: Mesos
>          Issue Type: Bug
>          Components: master, security
>    Affects Versions: 1.4.1, 1.5.1, 1.6.0, 1.7.0
>            Reporter: Gastón Kleiman
>            Priority: Blocker
>              Labels: mesosphere
>
> The Mesos master sends a {{ShutdownMessage}} to an agent if there is an [authentication|https://github.com/apache/mesos/blob/d733b1031350e03bce443aa287044eb4eee1053a/src/master/master.cpp#L6532-L6543]
or an [authorization|https://github.com/apache/mesos/blob/d733b1031350e03bce443aa287044eb4eee1053a/src/master/master.cpp#L6622-L6633]
error during agent registration.
>  
> Upon receipt of this message, the agent kills alls its tasks and commits suicide. This
means that transient auth errors can lead to whole agents being killed along with it's tasks.
> I think the master should stop sending the {{ShutdownMessage}}s in these cases, or at
least let the agent retry the registration a few times before asking it to shutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message