zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Stribling <st...@nicira.com>
Subject new leader accepting create requests too early?
Date Tue, 22 Mar 2011 02:15:51 GMT
Hi all,

I ran into a weird case where Zookeeper seems to have elected a new 
leader (a node that just restarted and missed several operations), and a 
client is able to connect to it and create a new sequential node that 
has a number earlier than the last node it created.  I don't have full 
logs, or a live system in this state, or any data directories, just some 
partial server logs and the evidence as seen by the client.  Haven't 
tried reproducing it yet, just wanted to see if anyone here had any 
ideas.  Here's the scenario (probably more info than necessary, but 
trying to be complete)

Version: Zookeeper 3.3.3

1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called 
nodes #1, #2, and #3 below):
2) Nodes periodically (and throughout this whole timeline) create 
sequential, non-ephemeral nodes under the /zkrsm parent node.
3) 5:46:57: Node #1 gets notified of 
/zkrsm/0000000000000000_record0000002116
4) 5:47:06: Node #1 restarts and rejoins
5) 5:49:26: Node #2 gets notified of 
/zkrsm/0000000000000000_record0000002708
6) 5:49:29: Node #2 restarts and rejoins
7) 5:52:01: Node #3 gets notified of 
/zkrsm/0000000000000000_record0000003291
8) 5:52:02: Node #3 restarts and begins the rejoining process
9) 5:52:08: Node #1 successfully creates 
/zkrsm/0000000000000000_record0000003348
10) 5:52:08: Node #2 dies after getting notified of 
/zkrsm/0000000000000000_record0000003348
11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't 
have wallclock timestamps, so not exactly sure on the ordering of this step)
12) 5:52:15: Node #1 successfully creates 
/zkrsm/0000000000000000_record0000003292

Note that the node created in step #12 is lower than the one created in 
step #9, and is exactly one greater than the last node seen by node #3 
before it restarted.

The leader election bit from node #3's log after restarting might be of 
interest:

>
> 2644 [QuorumPeer:/0.0.0.0:2888] INFO 
> org.apache.zookeeper.server.quorum.QuorumPeer  - LOOKING
> 2644 [QuorumPeer:/0.0.0.0:2888] DEBUG 
> org.apache.zookeeper.server.quorum.QuorumPeer  - Initializing leader 
> election protocol...
> 2647 [QuorumPeer:/0.0.0.0:2888] INFO 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - New election. 
> My id =  37, Proposed zxid = 17179869831
> 2650 [WorkerSender Thread] DEBUG 
> org.apache.zookeeper.server.quorum.QuorumCnxManager  - Opening channel 
> to server 126
> 2653 [WorkerReceiver Thread] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Receive new 
> notification message. My id = 37
> 2653 [WorkerReceiver Thread] INFO 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Notification: 
> 37 (n.leader), 17179869831 (n.zxid), 1 (n.round), LOOKING (n.state), 
> 37 (n.sid), LOOKING (my state)
> 2654 [QuorumPeer:/0.0.0.0:2888] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - id: 37, 
> proposed id: 37, zxid: 17179869831, proposed zxid: 17179869831
> 2655 [QuorumPeer:/0.0.0.0:2888] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Adding vote: 
> From = 37, Proposed leader = 37, Porposed zxid = 17179869831, Proposed 
> epoch = 1
> 2657 [WorkerSender Thread] INFO 
> org.apache.zookeeper.server.quorum.QuorumCnxManager  - Have smaller 
> server identifier, so dropping the connection: (126, 37)
> 2657 [WorkerSender Thread] DEBUG 
> org.apache.zookeeper.server.quorum.QuorumCnxManager  - Opening channel 
> to server 215
> 2661 [Thread-3] DEBUG 
> org.apache.zookeeper.server.quorum.QuorumCnxManager  - Connection 
> request /13.0.0.12:60623
> 2661 [Thread-3] DEBUG 
> org.apache.zookeeper.server.quorum.QuorumCnxManager  - Connection 
> request: 37
> 2661 [WorkerSender Thread] INFO 
> org.apache.zookeeper.server.quorum.QuorumCnxManager  - Have smaller 
> server identifier, so dropping the connection: (215, 37)
> 2662 [Thread-3] DEBUG 
> org.apache.zookeeper.server.quorum.QuorumCnxManager  - Address of 
> remote peer: 126
> 2668 [Thread-3] DEBUG 
> org.apache.zookeeper.server.quorum.QuorumCnxManager  - Connection 
> request /13.0.0.11:60190
> 2668 [Thread-3] DEBUG 
> org.apache.zookeeper.server.quorum.QuorumCnxManager  - Connection 
> request: 37
> 2668 [Thread-3] DEBUG 
> org.apache.zookeeper.server.quorum.QuorumCnxManager  - Address of 
> remote peer: 215
> 2669 [WorkerReceiver Thread] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Receive new 
> notification message. My id = 37
> 2670 [WorkerReceiver Thread] INFO 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Notification: 
> 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), LOOKING (n.state), 
> 126 (n.sid), LOOKING (my state)
> 2670 [QuorumPeer:/0.0.0.0:2888] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - id: 215, 
> proposed id: 37, zxid: 12884902548, proposed zxid: 17179869831
> 2670 [WorkerReceiver Thread] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Receive new 
> notification message. My id = 37
> 2670 [QuorumPeer:/0.0.0.0:2888] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Adding vote: 
> From = 126, Proposed leader = 215, Porposed zxid = 12884902548, 
> Proposed epoch = 3
> 2670 [WorkerReceiver Thread] INFO 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Notification: 
> 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), FOLLOWING 
> (n.state), 126 (n.sid), LOOKING (my state)
> 2671 [WorkerSender Thread] DEBUG 
> org.apache.zookeeper.server.quorum.QuorumCnxManager  - There is a 
> connection already for server 126
> 2671 [WorkerSender Thread] DEBUG 
> org.apache.zookeeper.server.quorum.QuorumCnxManager  - There is a 
> connection already for server 215
> 2671 [WorkerReceiver Thread] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Receive new 
> notification message. My id = 37
> 2671 [WorkerReceiver Thread] INFO 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Notification: 
> 37 (n.leader), 17179869831 (n.zxid), 3 (n.round), LOOKING (n.state), 
> 37 (n.sid), LOOKING (my state)
> 2671 [WorkerReceiver Thread] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Receive new 
> notification message. My id = 37
> 2671 [WorkerReceiver Thread] INFO 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Notification: 
> 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), LOOKING (n.state), 
> 215 (n.sid), LOOKING (my state)
> 2671 [QuorumPeer:/0.0.0.0:2888] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - id: 37, 
> proposed id: 37, zxid: 17179869831, proposed zxid: 17179869831
> 2672 [WorkerReceiver Thread] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Receive new 
> notification message. My id = 37
> 2672 [QuorumPeer:/0.0.0.0:2888] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Adding vote: 
> From = 37, Proposed leader = 37, Porposed zxid = 17179869831, Proposed 
> epoch = 3
> 2672 [WorkerReceiver Thread] INFO 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Notification: 
> 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), LEADING (n.state), 
> 215 (n.sid), LOOKING (my state)
> 2672 [QuorumPeer:/0.0.0.0:2888] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - id: 215, 
> proposed id: 37, zxid: 12884902548, proposed zxid: 17179869831
> 2672 [QuorumPeer:/0.0.0.0:2888] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Adding vote: 
> From = 215, Proposed leader = 215, Porposed zxid = 12884902548, 
> Proposed epoch = 3
> 2672 [WorkerReceiver Thread] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Receive new 
> notification message. My id = 37
> 2672 [WorkerReceiver Thread] INFO 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Notification: 
> 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), FOLLOWING 
> (n.state), 126 (n.sid), LEADING (my state)
> 2673 [QuorumPeer:/0.0.0.0:2888] INFO 
> org.apache.zookeeper.server.quorum.QuorumPeer  - LEADING
> 2675 [QuorumPeer:/0.0.0.0:2888] INFO 
> org.apache.zookeeper.server.quorum.Leader  - TCP NoDelay set to: true
> 2693 [WorkerReceiver Thread] DEBUG 
> org.apache.zookeeper.server.quorum.FastLeaderElection  - Receive new 
> notification message. My id = 37

I don't know much about the internal workings of Zookeeper, but what I 
find weird here is that node #3 proposes zxid 17179869831 (0x400000287), 
which seems to indicate that it wants to start well into epoch 4 before 
it even opens channels to other nodes, finding out that their zxids are 
only 12884902548 (0x300000294).  Is it possible that node #3 thought its 
zxid was not less than the other nodes' zxids, and so did not bother 
transferring the operations it missed?

Any other ideas on what could have gone wrong?  I dont see any other 
obvious exceptions in node #3's log, except:

> 2487 [pool-1-thread-2] DEBUG 
> org.apache.zookeeper.server.persistence.FileTxnLog  - EOF excepton 
> java.io.EOFException: Failed to read

but that seems to happen all the time, and it's at DEBUG level, so I'm 
guessing it's expected.

Thanks,

Jeremy



Mime
View raw message