hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Flavio Junqueira (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
Date Wed, 03 Nov 2010 18:44:28 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927949#action_12927949
] 

Flavio Junqueira commented on ZOOKEEPER-917:
--------------------------------------------

Even though the logs do not make a lot of sense for me at this point, I was thinking that
your scenario is not supposed to work given our guarantees. Let's look at an example.

Suppose we have 3 servers: A, B, and  C. Suppose that C is initially the leader and proposes
operations that B is able to ack, but A doesn't. Now, suppose that I come and replace C with
a fresh server, same id but empty state, and I do it before A and B are able to elect a new
leader and recover. In this case, A and C may form a quorum and the state of the ZooKeeper
ensemble would be empty. The replacement of server C with a fresh server violates our assumptions.


It should work, though, if you add a fresh server with a working ensemble. That is, you let
A and B elect a new leader, and then you start the new C server. In your case, I'm still not
sure why it happens because the initial zxid of node 1 is 4294967742 according to your excerpt.


> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup).
The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11
was permanently removed from service and could not contribute to the quorum any further (powered
off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find
the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had
not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no
attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message