zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Anderson...@banjiewen.net>
Subject Re: Incrementally bootstrapping a 3.5.0-alpha cluster?
Date Thu, 25 Jun 2015 16:43:41 GMT
Hi Alexander, I've had much better luck with the codebase @91ecdac,
but I've still observed the "Have smaller server identifier" type
failure at least once. It's reliable enough for me to work around the
remaining failures, at least.

Thanks!
--
b

On Wed, Jun 24, 2015 at 8:20 AM, Alexander Shraer <shralex@gmail.com> wrote:
> Hi Benjamin, I'm curious if this worked
>
> thanks,
> Alex
>
> On Sat, Jun 20, 2015 at 7:40 PM, Alexander Shraer <shralex@gmail.com> wrote:
>
>> There were bug fixes since the 2014 release. So if it doesn't work perhaps
>> you could try with trunk:
>>
>> svn checkout http://svn.apache.org/repos/asf/zookeeper/trunk <local dir>
>>
>> On Sat, Jun 20, 2015 at 7:35 PM, Alexander Shraer <shralex@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Approach 1 isn't supposed to work, since each server forms its own
>>> ensemble. Each server is the leader in its own ensemble
>>> so when you try to reconfigure it expects the other server to connect as
>>> a follower but that doesn't happen. The error just means that you can't
>>> reconfigure since you will loose a quorum (in an ensemble of 2 servers you
>>> must have both ack every request and here you won't have that since they
>>> are not talking).
>>>
>>> Approach 2 is supposed to work, no matter if the first server is 2 or 1.
>>> There may be a bug of course, but I just locally tried the scenario that
>>> fails for you (as I understood it) and it worked. Here is my setup, perhaps
>>> your can send me yours if it still doesn't work.
>>>
>>> server 1:
>>> dataDir=/home/shralex/zk-sat/zookeeper1
>>> standaloneEnabled=false
>>> syncLimit=2
>>> initLimit=5
>>> tickTime=2000
>>> server.1=localhost:2721:2731:participant;localhost:2791
>>> server.2=localhost:2722:2732:participant;localhost:2792
>>>
>>> server 2:
>>> dataDir=/home/shralex/zk-sat/zookeeper2
>>> standaloneEnabled=false
>>> syncLimit=2
>>> initLimit=5
>>> tickTime=2000
>>> server.2=localhost:2722:2732:participant;localhost:2792
>>>
>>> starting server 2 first. it says its the leader. starting server 1. then
>>> connecting to server 2 with a client and issuing a reconfig adding server 1
>>>
>>> Alex
>>>
>>>
>>>
>>> On Fri, Jun 19, 2015 at 6:27 PM, Benjamin Anderson <b@banjiewen.net>
>>> wrote:
>>>
>>>> Hi there - I'm working on automating bootstrapping of a 3-node ZK
>>>> 3.5.0-alpha ensemble and I'm running in to some problems with getting
>>>> the nodes to join up. The dynamic configuration page[1] suggests that,
>>>>
>>>> "...it is possible to start a ZooKeeper ensemble containing a single
>>>> participant and to dynamically grow it by adding more servers"
>>>>
>>>> which is what I'm attempting to do. I've found, however, that this can
>>>> be rather problematic. What is the "correct" procedure for dynamically
>>>> growing an ensemble from a single participant?
>>>>
>>>> I've tried two approaches:
>>>>
>>>> Approach A:
>>>>
>>>> 1. Start two nodes, one with myid=1 and one with myid=2. Each node's
>>>> dynamicConfigFile contains a single line referring to itself, i.e.,
>>>> neither node is aware of the other.
>>>>
>>>> 2. Open a zkCli to either of the two nodes and issue a `reconfig`
>>>> command to add the other, unknown node.
>>>>
>>>> This method fails with "KeeperErrorCode = NewConfigNoQuorum for".
>>>>
>>>> Approach B:
>>>>
>>>> 1. Start one node with myid=1 and a dynamicConfigFile that only refers
>>>> to itself, then start a second node with myid=2 and a
>>>> dynamicConfigFile that refers to itself *and* the node with myid=1.
>>>>
>>>> 2. Open a zkCli to the node with myid=1 and issue a reconfig command
>>>> to add the node with myid=2.
>>>>
>>>> This approach works! However, if the ordering is reversed (i.e., the
>>>> myid=2 node boots first and refers only to itself, and the myid=1 node
>>>> refers to both itself and the myid=2 node,) then the myid=1 node will
>>>> *never* come up cleanly - it hangs forever logging messages such as
>>>> the one in this gist[2]. In my environment the boot ordering is not
>>>> guaranteed, so this is rather challenging for me.
>>>>
>>>> My baseline config is roughly this[3].
>>>>
>>>> Is there a well-known and reliable way to incrementally join nodes to
>>>> a ZK ensemble in 3.5.0-alpha? Do I need to be using a newer version
>>>> than the release cut back in August 2014?
>>>>
>>>> Thanks!
>>>> --
>>>> b
>>>>
>>>> [1]: http://zookeeper.apache.org/doc/trunk/zookeeperReconfig.html
>>>> [2]: https://gist.github.com/banjiewen/936f5620d33a8eb0ddf4
>>>> [3]: https://gist.github.com/banjiewen/c7f11c749933ac1bab72
>>>>
>>>
>>>
>>

Mime
View raw message