hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vishal K (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
Date Tue, 09 Nov 2010 15:35:13 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930169#action_12930169
] 

Vishal K commented on ZOOKEEPER-917:
------------------------------------

Hi Flavio,

Lets see if I understand this right. Server 2 was replaced and became the leader. Server 2
receives old notifications from others and accepts leadership even if its <epoch, zxid>
prior to receiving any notifications was <1,0>.

Server 2 accepts leadership because 0, 1 vote for 2 and we allow 2 to become leader based
on point 1. in your comment on 07/Nov/10.

My question with regards to point 1.:
- In your example, it is OK to allow A to join the cluster and become a follower (so that
A does not remain locked out). But is it OK for A to accept leadership even if it has not
seen the zxid reported by others (regardless of the votes)? Shouldn't it reject leadership?

Am I still misunderstanding the problem?


> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup).
The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11
was permanently removed from service and could not contribute to the quorum any further (powered
off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find
the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had
not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no
attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message