mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefano (JIRA)" <>
Subject [jira] [Commented] (MESOS-3548) Investigate federations of Mesos masters
Date Wed, 13 Apr 2016 23:19:25 GMT


Stefano commented on MESOS-3548:


Can you please share the document?
i am a telecommunication student and im doing my master thesis on mesos.
I should try to build mesos clusters on different datacenters sites, and they should be able
to connect each other and share the resources.
Then i should be able to runa  task on marathon and this task would be assigned to a local
agent or to a remote one.
As you  perfectly described, is a sort of federation of mesos clusters.
I already set 2 mesos cluster, but i have no idea on how to let them communicate.
Is there a possible tutorial to follow?


Best regards.


> Investigate federations of Mesos masters
> ----------------------------------------
>                 Key: MESOS-3548
>                 URL:
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Neil Conway
>              Labels: federation, mesosphere, multi-dc
> In a large Mesos installation, the operator might want to ensure that even if the Mesos
masters are inaccessible or failed, new tasks can still be scheduled (across multiple different
frameworks). HA masters are only a partial solution here: the masters might still be inaccessible
due to a correlated failure (e.g., Zookeeper misconfiguration/human error).
> To support this, we could support the notion of "hierarchies" or "federations" of Mesos
masters. In a Mesos installation with 10k machines, the operator might configure 10 Mesos
masters (each of which might be HA) to manage 1k machines each. Then an additional "meta-Master"
would manage the allocation of cluster resources to the 10 masters. Hence, the failure of
any individual master would impact 1k machines at most. The meta-master might not have a lot
of work to do: e.g., it might be limited to occasionally reallocating cluster resources among
the 10 masters, or ensuring that newly added cluster resources are allocated among the masters
as appropriate. Hence, the failure of the meta-master would not prevent any of the individual
masters from scheduling new tasks. A single framework instance probably wouldn't be able to
use more resources than have been assigned to a single Master, but that seems like a reasonable
> This feature might also be a good fit for a multi-datacenter deployment of Mesos: each
Mesos master instance would manage a single DC. Naturally, reducing the traffic between frameworks
and the meta-master would be important for performance reasons in a configuration like this.
> Operationally, this might be simpler if Mesos processes were self-hosting ([MESOS-3547]).

This message was sent by Atlassian JIRA

View raw message