zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Young <younge+zookee...@gmail.com>
Subject Split a large ZooKeeper cluster into multiple separate clusters
Date Wed, 07 Sep 2016 21:19:19 GMT
I have a very large ZooKeeper cluster which manages config and replication
for multiple SolrCloud clusters.  I want to split the monolithic ZooKeeper
cluster into smaller, more manageable clusters in a live migration (i.e.
minimal or no downtime).

I have collections that can be updated dynamically which are already
separated logically in different SolrCloud clusters.  I also have some
static collections (never updated) that have replicas across all the
SolrCloud clusters though.  All my collections only have a single shard.

ZooKeeper version: 3.4.6
Solr version: 4.8.1


Example current setup (minimal):
ZK cluster servers:  z1-1, z1-2, z1-3, z2-1, z2-2, z2-3
Solr cluster 1 servers: s1-1, s1-2
Solr cluster 2 servers: s2-1, s2-2

Example collections:
Dynamic collection 1: c1 (sharded on s1-1, s1-2)
Dynamic collection 2: c2 (sharded on s2-1, s2-2)
Static collection 1: c3 (sharded on all 4 Solr servers s1-1, s1-2, s2-1,
s2-2)


End goal of example setup after ZooKeeper split:
ZK cluster 1 servers: z1-1, z1-2, z1-3 (only knows about collections on
s1-1, s1-2)
ZK cluster 2 servers: z2-1, z2-2, z2-3 (only knows about collections on
s2-1, s2-2)


I think this can be accomplished with the following steps (some caveats):

1. Reconfigure Solr instances with appropriate subset of ZooKeeper servers
for the end goal and do a rolling restart of Solr instances (i.e.
-DzkHost=host1:port1,host2:port2,...)

2. Reconfigure ZooKeeper server list in configuration and restart all
ZooKeeper instances (i.e. server.1#=host1:port1a:port1b)

3. Manually remove references for each dynamic collection from ZooKeeper
clusterstate that the "other" cluster is now solely managing
e.g. In ZK cluster 1 remove dynamic collection 2
Using delete collection API: /admin/collections?action=DELETE

4. Manually remove references for each static replica from ZooKeeper
clusterstate that the cluster should longer know about
e.g. In ZK cluster 1, static collection 1, remove replica for servers s2-1,
s2-2
Using delete replica API: /admin/collections?action=DELETEREPLICA

My theory is this:

Step 1 should logically divide the Solr instances so they won't attempt to
connect to the "other" ZooKeeper cluster

Step 2 should logically divide the ZooKeeper servers into 2 clusters but
with duplicate knowledge of collections that need to be cleaned up

Steps 3 and 4 clean up the ZooKeeper cluster


Steps 3 and 4 concern me a little.  The documentation for DELETEREPLICA
says this:
"If the corresponding core is up and running the core is unloaded, the
entry is removed from the clusterstate, and (by default) delete the
instanceDir and dataDir.  If the node/core is down, the entry is taken off
the clusterstate and if the core comes up later it is automatically
unregistered."

To me, this indicates that ZooKeeper will still maintain a reference to a
deleted replica.

The bottom line is that I don't want ZooKeeper cluster 1 to be able to
force a removal of collections that are managed by ZooKeeper cluster 2 (and
visa versa) and it should have no references that the alternate replicas
exist.


Do these steps seem reasonable?  Is there a better way?  I would like to
avoid manually modifying the clusterstate.json, but I'm guessing that it
might be the only way.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message