mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <>
Subject [jira] [Commented] (MESOS-7681) Add safeguard for new agents with new features + old master
Date Tue, 05 Dec 2017 23:06:00 GMT


Vinod Kone commented on MESOS-7681:

FYI, Master capabilities have landed. [~mcypark] will you be working on this?

> Add safeguard for new agents with new features + old master
> -----------------------------------------------------------
>                 Key: MESOS-7681
>                 URL:
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Neil Conway
>              Labels: mesosphere
> Consider this scenario:
> * Mesos cluster with 3 masters and 1 agent.
> * 2 of the masters (including the leader) are upgraded to Mesos 1.4; remaining master
stays at Mesos 1.3 (e.g., due to operator error).
> * Agent is upgraded to Mesos 1.4
> * Framework creates a reservation refinement on the agent
> * Leading master fails; Mesos 1.3 master is elected as the new leader
> In this scenario, the agent will send resources to the master in the new (post-refinement)
format, but the master will not understand those new fields. This results in an inconsistency
between the agent's resources and the master's view of the agent's resources. This could lead
to various problems -- in effect, the reservation the framework previously made has been "forgotten"
during master failover. Similarly, if the agent attempts to unreserve the resources (using
the master's version of the resource), that operation will be rejected by the agent.
> To fix this, it seems we need an explicit negotiation between the agent and the master
as part of registration/re-registration. The agent would examine its resources and say which
capabilities it _requires_ of the master (not just the capabilities the agent _supports_);
if the master does not support those capabilities, the agent cannot safely register.
> We could implement this either via master capabilities (agent computes the master capabilities
it requires and declines to register if the master isn't new enough), or via agent capabilities
(agent tells master the capabilities it is "actively using"; master refuses to allow any agent
to register that is using a capability the master doesn't recognize/support). Probably the
former is safer/cleaner.

This message was sent by Atlassian JIRA

View raw message