mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xudong Ni (Jira)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-10032) Mesos agent should sever proactively master connection when failing to detect the leading master
Date Fri, 08 Nov 2019 21:18:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970584#comment-16970584
] 

Xudong Ni commented on MESOS-10032:
-----------------------------------

https://reviews.apache.org/r/71742/

> Mesos agent should sever proactively master connection when failing to detect the leading
master
> ------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-10032
>                 URL: https://issues.apache.org/jira/browse/MESOS-10032
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Xudong Ni
>            Assignee: Xudong Ni
>            Priority: Major
>
> We have observed that this often happens when the agents losing ZK connections and resetting
its master to None and beginning dropping messages from the master because they can't verify
that the messages are valid.
> This has caused Jarvis to be unable to kill tasks (and they aren't counted as unreachable
because the master can still reach the agent).
> A reasonable solution is for the agent to disconnect from the master upon resetting the
master it tracks since it's just going to drop control messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message