hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexandre Hardy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
Date Wed, 03 Nov 2010 12:43:24 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927833#action_12927833
] 

Alexandre Hardy commented on ZOOKEEPER-917:
-------------------------------------------

Excerpt from logs on 192.168.130.10:
{noformat}
2010-11-02 09:36:28,060 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: New election:
4294967742
2010-11-02 09:36:28,061 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing
close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2010-11-02 09:36:28,061 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0
NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.10:37781]
2010-11-02 09:36:28,061 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification:
0, 4294967742, 2, 0, LOOKING, LOOKING, 0
2010-11-02 09:36:28,063 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Adding
vote
2010-11-02 09:36:28,064 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing
close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2010-11-02 09:36:28,064 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0
NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.14:50222]
2010-11-02 09:36:28,064 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification:
2, -1, 1, 0, LOOKING, FOLLOWING, 1
2010-11-02 09:36:28,065 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing
close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2010-11-02 09:36:28,065 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0
NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.14:50223]
2010-11-02 09:36:28,068 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing
close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2010-11-02 09:36:28,068 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0
NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.12:59044]
2010-11-02 09:36:28,073 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification:
2, -1, 1, 0, LOOKING, LEADING, 2
2010-11-02 09:36:28,073 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing
close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2010-11-02 09:36:28,073 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0
NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.10:37786]
2010-11-02 09:36:28,073 INFO org.apache.zookeeper.server.quorum.QuorumPeer: FOLLOWING
2010-11-02 09:36:28,073 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server 
2010-11-02 09:36:28,074 INFO org.apache.zookeeper.server.quorum.Follower: Following zookeeper3/192.168.130.13:2888
{noformat}

Excerpt from logs on 192.168.130.11:
{noformat}
2010-11-02 09:36:14,065 INFO org.apache.zookeeper.server.quorum.QuorumPeerConfig: Defaulting
to majority quorums
2010-11-02 09:36:14,120 INFO org.apache.zookeeper.server.quorum.QuorumPeerMain: Starting quorum
peer
2010-11-02 09:36:14,172 INFO org.apache.zookeeper.server.quorum.QuorumCnxManager: My election
bind port: 3888
2010-11-02 09:36:14,182 INFO org.apache.zookeeper.server.quorum.QuorumPeer: LOOKING
2010-11-02 09:36:14,183 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: New election:
-1
2010-11-02 09:36:14,191 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification:
2, -1, 1, 2, LOOKING, LOOKING, 2
2010-11-02 09:36:14,191 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Adding
vote
2010-11-02 09:36:14,193 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Interrupted
while waiting for message on queue
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1952)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
    at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:345)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:532)
2010-11-02 09:36:14,194 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Send worker
leaving thread
2010-11-02 09:36:14,194 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification:
2, -1, 1, 2, LOOKING, LOOKING, 1
2010-11-02 09:36:14,194 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Adding
vote
2010-11-02 09:36:14,195 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Interrupted
while waiting for message on queue
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1952)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
    at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:345)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:532)
2010-11-02 09:36:14,195 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Send worker
leaving thread
2010-11-02 09:36:14,202 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection
broken:
java.nio.channels.AsynchronousCloseException
    at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:281)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:593)
2010-11-02 09:36:14,401 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: About
to leave instance:2, -1, 2, LEADING
2010-11-02 09:36:14,402 INFO org.apache.zookeeper.server.quorum.QuorumPeer: LEADING
{noformat}

I'm not sure why 192.168.130.13 ended up as the leader when it did not have the most up to
date transaction ID. Also, I don't see the notification messages of the other nodes in the
logs of 192.168.130.13.

Is there any reason why other nodes would accept 192.168.130.13 as the leader?

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup).
The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11
was permanently removed from service and could not contribute to the quorum any further (powered
off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find
the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had
not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no
attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message