hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vishal K (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
Date Fri, 05 Nov 2010 14:32:42 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928605#action_12928605
] 

Vishal K commented on ZOOKEEPER-917:
------------------------------------

Hi Flavio,

Sorry for not making much progress on (http://wiki.apache.org/hadoop/ZooKeeper/ClusterMembership).
I have spent some time to understand the code. But It is a bit difficult to focus on development
without dedicated development time. I am pushing to get dedicated development time at work
for this so that I don't have to rely on my spare time. 

Few questions related to your comments:
1. Can you please elaborate on : "At the same time, a server A decides to follow another server
B if it receives a message from B saying that B is leading and from a quorum saying that they
are following, even if A is in a later election epoch. This mechanism is there to avoid A
being locked out of the ensemble in the case it partitions away and comes back later."

2. Why is it not OK for B to give up leadership when it sees that its <epoch,zxid> is
lower than others?

Thanks.


> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup).
The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11
was permanently removed from service and could not contribute to the quorum any further (powered
off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find
the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had
not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no
attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message