hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Flavio Junqueira (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
Date Sun, 07 Nov 2010 15:41:10 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929354#action_12929354
] 

Flavio Junqueira commented on ZOOKEEPER-917:
--------------------------------------------

Hi Vishal, It is certainly understand not having dedicated development time being an issue.
I actually didn't know you're interested in the cluster membership... I'm glad to hear, though.

On your questions:
# Suppose we have an ensemble comprising 3 servers: A, B, and C. Now suppose that C is the
leader, and both A and B follow C. If A disconnects from C for whatever reason (e.g., network
partition) and it tries to elect a leader, it won't get any other process in the LOOKING state.
It will actually receive a notification from C saying that it is leading and one from B saying
that it is following C, both with an earlier leader election epoch. To avoid having A locked
out (not able to elect C as leader), we implemented this exception: a process accepts going
back to an earlier leader election only if it receives a notification from the leader saying
that it is leading and from a quorum saying that it is following;
# I'm not sure if you referring to specific problem of this jira or if you are asking about
my hypothetical example. Assuming it is the former, the follower (Follower:followLeader())
checks if the leader is proposing an earlier epoch, and if not, it accepts the leader snapshot.
Because the epoch is the same, all followers will accept the leader snapshot follow it. 

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup).
The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11
was permanently removed from service and could not contribute to the quorum any further (powered
off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find
the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had
not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no
attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message