mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tobias Weingartner (JIRA)" <>
Subject [jira] [Commented] (MESOS-890) Figure out a way to migrate a live Mesos cluster to a different ZooKeeper cluster
Date Fri, 18 Jul 2014 17:44:07 GMT


Tobias Weingartner commented on MESOS-890:

Option #3?

 * write non-ephemeral nodes into zk2 pointing to masters (possibly a script to keep them
 * re-configure slaves to use zk2
 * re-configure frameworks to use zk2
 * re-configure masters to use zk2 (not restarted yet)
 * nuke non-ephemeral nodes on zk2
 * restart masters

> Figure out a way to migrate a live Mesos cluster to a different ZooKeeper cluster
> ---------------------------------------------------------------------------------
>                 Key: MESOS-890
>                 URL:
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>            Reporter: Raul Gutierrez Segales
>            Assignee: Raul Gutierrez Segales
> I've been chatting with [~vinodkone] about approaching a live ZK cluster migration. Here
are the options we came up with.
> For the descriptions we treat `zk1` as the current working cluster, `obs` as a  bunch
of ZooKeeper Observers [1] and `zk2` as the new cluster to which we need to migrate. 
> Approach #1: Using Observers
> With this option we need to:
> * add obs to zk1
> * restart slaves to have them use obs to find their master
> * restart the framework having it use obs to find the mesos master
> * restart the mesos masters having them use obs to perform their election
> * we then stop all ZK obs and remove their data (since they will need to sync up with
an entirely new cluster, we need to lose the old data)
> * we restart ZK obs having them be part of zk2
> * at this point the slaves, the framework and the masters can reach the ZK obs again
and an election happens
> * optionally you can restart slaves, the framework and masters again using zk2 instead
of the ZK obs if you wanted to decommission them. 
> This assumes that we can do the last three steps in << 75 secs (75 secs being the
slave health check timeout). This is a reasonable assumption if the data size in zk2 is small
enough to ensure that the ZK obs can sync up quickly with zk2. If zk2 is a new cluster with
no data then this should be very fast.
> The good things of this approach are:
> * no mesos code change
> * it is very easy to rollback half way through, if need be
> The hard issues are:
> * Manipulating the ZK obs (i.e.: stopping, removing the data from zk1 and starting again)
needs to be done with care. Messing up configs or not removing the data from zk1 on any of
the ZK obs will cause problems
> * we need to restart all slaves to have them use the ZK obs instead of connecting to
zk1 directly. But with slave recovery this isn't an issue, just an extra step.
> * same thing for the framework and the masters
> Approach #2: Dual publishing from mesos masters
> With this option we would augment the election handling code in mesos masters to have
it deal with the notion of a primary and secondary ZK clusters. Master registration and election
would then work as follows:
> * create an ephemeral|sequential znode in zk1 (i.e.:  /path/to/znode/mesos_000023)
> * create an ephemeral, but not sequential, znode in zk2 with the exact same path as what
was created in zk1 (i.e.: /path/to/znode/mesos_000023)
> * make sure both sessions, in zk1 and zk2, are always in the same state (i.e.: if one
expires, the other one should be closed, etc.)
> For now, lets omit a few implementation details which might need extra care and assume
we can make this work consistently in such a way that zk2 reflects accurately elections that
happen in zk1. This means that regardless of being connected to zk1 or zk2, you always get
the same master. Once we have this the migration steps would be:
> * restart slaves to have them use zk2 where masters can be found by virtue of what we
implemented above
> * restart the framework so that it finds the mesos master in zk2
> * stop all mesos masters (they all need to be stopped before moving to the next step)
> * start all mesos masters using zk2 as its primary and only cluster
> Again, this assumes we can do the last two steps in << 75 secs (or if we needed
to, we could bump the slave health check timeout). Which, again, sounds achievable given that
masters have no state and their start-up time is very short.
> The good things of this approach are:
> - no tinkering with extra ZK servers nor with ZK configs 
> The hard issues are:
> - extra code needs to be added to the election handling bits of mesos master to address
a very rare, but probable, use-case of cluster migration. It might take a bit of time to get
that code right. 
> - it's easier to end up with a bad state if any of the mesos masters ends up with a bad
config or is restarted earlier and ends up publishing differently than the other masters.
This could lead to elections with differing results. 
> Thoughts?
> [1]

This message was sent by Atlassian JIRA

View raw message