From Benjamin Anderson...@banjiewen.net>
Subject Incrementally bootstrapping a 3.5.0-alpha cluster?
Date Sat, 20 Jun 2015 01:27:11 GMT
Hi there - I'm working on automating bootstrapping of a 3-node ZK
3.5.0-alpha ensemble and I'm running in to some problems with getting
the nodes to join up. The dynamic configuration page[1] suggests that,

"...it is possible to start a ZooKeeper ensemble containing a single
participant and to dynamically grow it by adding more servers"

which is what I'm attempting to do. I've found, however, that this can
be rather problematic. What is the "correct" procedure for dynamically
growing an ensemble from a single participant?

I've tried two approaches:

Approach A:

1. Start two nodes, one with myid=1 and one with myid=2. Each node's
dynamicConfigFile contains a single line referring to itself, i.e.,
neither node is aware of the other.

2. Open a zkCli to either of the two nodes and issue a `reconfig`
command to add the other, unknown node.

This method fails with "KeeperErrorCode = NewConfigNoQuorum for".

Approach B:

1. Start one node with myid=1 and a dynamicConfigFile that only refers
to itself, then start a second node with myid=2 and a
dynamicConfigFile that refers to itself *and* the node with myid=1.

2. Open a zkCli to the node with myid=1 and issue a reconfig command
to add the node with myid=2.

This approach works! However, if the ordering is reversed (i.e., the
myid=2 node boots first and refers only to itself, and the myid=1 node
refers to both itself and the myid=2 node,) then the myid=1 node will
*never* come up cleanly - it hangs forever logging messages such as
the one in this gist[2]. In my environment the boot ordering is not
guaranteed, so this is rather challenging for me.

My baseline config is roughly this[3].

Is there a well-known and reliable way to incrementally join nodes to
a ZK ensemble in 3.5.0-alpha? Do I need to be using a newer version
than the release cut back in August 2014?


[1]: http://zookeeper.apache.org/doc/trunk/zookeeperReconfig.html
[2]: https://gist.github.com/banjiewen/936f5620d33a8eb0ddf4
[3]: https://gist.github.com/banjiewen/c7f11c749933ac1bab72

