hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexandre Hardy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
Date Wed, 03 Nov 2010 15:59:25 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927888#action_12927888
] 

Alexandre Hardy commented on ZOOKEEPER-917:
-------------------------------------------

Hi Flavio,

The three zookeeper servers are zookeeper1, zookeeper2 and zookeeper3.
Initially the servers were
    * 192.168.130.10: zookeeper1
    * 192.168.130.11: zookeeper3
    * 192.168.130.14: zookeeper2

After .11 was removed the servers were:
    * 192.168.130.10: zookeeper1
    * 192.168.130.13: zookeeper3
    * 192.168.130.14: zookeeper2

All other settings were set by hbase:
    * tickTime=2000
    * initLimit=10
    * syncLimit=5  
    * peerport=2888
    * leaderport=3888

zookeeper1 would have node id 0
zookeeper2 would have node id 1
zookeeper3 would have node id 2

I'm not sure what else I can give you concerning the configuration.

I note that in 192.168.130.14 (node id 1) we have 
{noformat}
2010-11-02 09:36:27,988 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: New election:
4294967742
2010-11-02 09:36:27,988 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification:
1, 4294967742, 2, 1, LOOKING, LOOKING, 1
2010-11-02 09:36:27,988 INFO org.apache.zookeeper.server.quorum.QuorumCnxManager: Have smaller
server identifier, so dropping the connection: (2, 1)
2010-11-02 09:36:27,988 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Adding
vote
2010-11-02 09:36:27,989 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification:
2, -1, 1, 1, LOOKING, FOLLOWING, 0
{noformat}
 
I don't think there is much chance of some kind of networking configuration, but could that
explain what we are seeing?



> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup).
The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11
was permanently removed from service and could not contribute to the quorum any further (powered
off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find
the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had
not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no
attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message