zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: Zookeeper on short lived VMs and ZOOKEEPER-107
Date Thu, 15 Mar 2012 15:33:09 GMT
yes, by replacing x at a time from 2x+1 you have quorum intersection.

i have one more question - zookeeper itself doesn't assume perfect
failure detection, which your scheme requires. what if the VM didn't
actually fail but just slow and then tries to reconnect ?

On Thu, Mar 15, 2012 at 2:50 AM, Christian Ziech
<christian.ziech@nokia.com> wrote:
> I don't think that we could be running into a split brain problem in our use
> case.
> Let me try to describe the scenario we are worried about (assuming an
> ensemble of 5 nodes A,B,C,D,E):
> - The ensemble is up and running and in sync
> - Node A with the host name "zookeeperA.whatever-domain.priv" goes down
> because the VM has gone away
> - That removal of the VM is detected and a new VM is spawned with the same
> host name "zookeeperA.whatever-domain.priv" - let's call that node A'
> - Node A' zookeeper wants to join the cluster - right now this gets rejected
> by the others since A' has a different IP address than A (and the old one is
> "cached" in the InetSocketAddress of the QuorumPeer instance)
> We could ensure that at any given time there is only at most one node with
> host name "zookeeperA.whatever-domain.priv" known by the ensemble and that
> once one node is replaced, it would not come back. Also we could make sure
> that our ensemble is big enough to compensate for a replacement of more than
> x nodes at a time (setting it to x*2 + 1 nodes).
> So if I did not misestimate our problem it should be (due to the
> restrictions) simpler than the problem to be solved by zookeeper-107. My
> intention is basically by solving this smaller discrete problem to not need
> to wait for that zookeeper-107 makes it into a release (the assumption is
> that a smaller fix has a possibly a chance to make it into the 3.4.x branch
> even).
> Am 15.03.2012 07:46, schrieb ext Alexander Shraer:
>> Hi Christian,
>> ZK-107 would indeed allow you to add/remove servers and change their
>> addresses.
>> > We could ensure that we always have a more or less fixed quorum of
>> > zookeeper servers with a fixed set of host names.
>> You should probably also ensure that a majority of the old ensemble
>> intersects with a majority of the new one.
>> Otherwise you have to run a reconfiguration protocol similarly to ZK-107.
>> For example, if you have 3 servers A B and C, and now you're adding D and E
>> that replace B and C, how would this work ?  it is probable that D and E
>> don't have the latest state (as you mention) and A is down or doesn't have
>> the latest state too (a minority might not have the latest state). Also, how
>> do you prevent split brain in this case ? meaning B and C thinking that they
>> are still operational ? perhaps I'm missing something but I suspect that the
>> change you propose won't be enough...
>> Best Regards,
>> Alex
>> On Wed, Mar 14, 2012 at 10:01 AM, Christian Ziech
>> <christian.ziech@nokia.com <mailto:christian.ziech@nokia.com>> wrote:
>>    Just a small addition: In my opinion the patch could really boil
>>    down to add a
>>      quorumServer.electionAddr = new
>>      InetSocketAddress(electionAddr.getHostName(),
>>    electionAddr.getPort());
>>    in the catch(IOException e) clause of the connectOne() method of
>>    the QuorumCnxManager. In addition on should perhaps make the
>>    electionAddr field in the QuorumPeer.QuorumServer class volatile
>>    to prevent races.
>>    I haven't checked this change yet fully for implications but doing
>>    a quick test on some machines at least showed it would solve our
>>    use case. What do the more expert users / maintainers think - is
>>    it even worthwhile to go that route?
>>    Am 14.03.2012 17:04, schrieb ext Christian Ziech:
>>        LEt me describe our upcoming use case in a few words: We are
>>        planning to use zookeeper in a cloud were typically nodes come
>>        and go unpredictably. We could ensure that we always have a
>>        more or less fixed quorum of zookeeper servers with a fixed
>>        set of host names. However the IPs associated with the host
>>        names would change every time a new server comes up. I browsed
>>        the code a little and it seems right now that the only problem
>>        is that the zookeeper server is remembering the resolved
>>        InetSocketAddress in its QuorumPeer hash map.
>>        I saw that possibly ZOOKEEPER-107 would also solve that
>>        problem but possibly in a more generic way than actually
>>        needed (perhaps here I underestimate the impact of joining as
>>        a server with an empty data directory to replace a server that
>>        previously had one).
>>        Given that - from looking at ZOOKEEPER-107 - it seems that it
>>        will still take some time for the proposed fix to make it into
>>        a release, would it make sense to invest time into a smaller
>>        work fix just for this "replacing a dropped server without
>>        rolling restarts" use case? Would there be a chance that a fix
>>        for this makes it into the 3.4.x branch?
>>        Are there perhaps other ways to get this use case supported
>>        without the need for doing rolling restarts whenever we need
>>        to replace one of the zookeeper servers?

View raw message