Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 40847 invoked from network); 22 Mar 2011 02:16:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Mar 2011 02:16:25 -0000 Received: (qmail 92390 invoked by uid 500); 22 Mar 2011 02:16:25 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 92336 invoked by uid 500); 22 Mar 2011 02:16:24 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 92328 invoked by uid 500); 22 Mar 2011 02:16:24 -0000 Delivered-To: apmail-hadoop-zookeeper-user@hadoop.apache.org Received: (qmail 92325 invoked by uid 99); 22 Mar 2011 02:16:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Mar 2011 02:16:24 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: unknown (nike.apache.org: error in processing during lookup of strib@nicira.com) Received: from [209.85.161.179] (HELO mail-gx0-f179.google.com) (209.85.161.179) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Mar 2011 02:16:15 +0000 Received: by gxk10 with SMTP id 10so3049726gxk.38 for ; Mon, 21 Mar 2011 19:15:54 -0700 (PDT) Received: by 10.100.193.12 with SMTP id q12mr3606183anf.7.1300760154009; Mon, 21 Mar 2011 19:15:54 -0700 (PDT) Received: from [172.16.0.50] ([66.201.54.10]) by mx.google.com with ESMTPS id b28sm3603186anb.48.2011.03.21.19.15.52 (version=SSLv3 cipher=OTHER); Mon, 21 Mar 2011 19:15:52 -0700 (PDT) Message-ID: <4D880657.2040005@nicira.com> Date: Mon, 21 Mar 2011 19:15:51 -0700 From: Jeremy Stribling User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.10) Gecko/20100619 Icedove/3.0.5 MIME-Version: 1.0 To: zookeeper-user Subject: new leader accepting create requests too early? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi all, I ran into a weird case where Zookeeper seems to have elected a new leader (a node that just restarted and missed several operations), and a client is able to connect to it and create a new sequential node that has a number earlier than the last node it created. I don't have full logs, or a live system in this state, or any data directories, just some partial server logs and the evidence as seen by the client. Haven't tried reproducing it yet, just wanted to see if anyone here had any ideas. Here's the scenario (probably more info than necessary, but trying to be complete) Version: Zookeeper 3.3.3 1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called nodes #1, #2, and #3 below): 2) Nodes periodically (and throughout this whole timeline) create sequential, non-ephemeral nodes under the /zkrsm parent node. 3) 5:46:57: Node #1 gets notified of /zkrsm/0000000000000000_record0000002116 4) 5:47:06: Node #1 restarts and rejoins 5) 5:49:26: Node #2 gets notified of /zkrsm/0000000000000000_record0000002708 6) 5:49:29: Node #2 restarts and rejoins 7) 5:52:01: Node #3 gets notified of /zkrsm/0000000000000000_record0000003291 8) 5:52:02: Node #3 restarts and begins the rejoining process 9) 5:52:08: Node #1 successfully creates /zkrsm/0000000000000000_record0000003348 10) 5:52:08: Node #2 dies after getting notified of /zkrsm/0000000000000000_record0000003348 11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't have wallclock timestamps, so not exactly sure on the ordering of this step) 12) 5:52:15: Node #1 successfully creates /zkrsm/0000000000000000_record0000003292 Note that the node created in step #12 is lower than the one created in step #9, and is exactly one greater than the last node seen by node #3 before it restarted. The leader election bit from node #3's log after restarting might be of interest: > > 2644 [QuorumPeer:/0.0.0.0:2888] INFO > org.apache.zookeeper.server.quorum.QuorumPeer - LOOKING > 2644 [QuorumPeer:/0.0.0.0:2888] DEBUG > org.apache.zookeeper.server.quorum.QuorumPeer - Initializing leader > election protocol... > 2647 [QuorumPeer:/0.0.0.0:2888] INFO > org.apache.zookeeper.server.quorum.FastLeaderElection - New election. > My id = 37, Proposed zxid = 17179869831 > 2650 [WorkerSender Thread] DEBUG > org.apache.zookeeper.server.quorum.QuorumCnxManager - Opening channel > to server 126 > 2653 [WorkerReceiver Thread] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new > notification message. My id = 37 > 2653 [WorkerReceiver Thread] INFO > org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: > 37 (n.leader), 17179869831 (n.zxid), 1 (n.round), LOOKING (n.state), > 37 (n.sid), LOOKING (my state) > 2654 [QuorumPeer:/0.0.0.0:2888] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - id: 37, > proposed id: 37, zxid: 17179869831, proposed zxid: 17179869831 > 2655 [QuorumPeer:/0.0.0.0:2888] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - Adding vote: > From = 37, Proposed leader = 37, Porposed zxid = 17179869831, Proposed > epoch = 1 > 2657 [WorkerSender Thread] INFO > org.apache.zookeeper.server.quorum.QuorumCnxManager - Have smaller > server identifier, so dropping the connection: (126, 37) > 2657 [WorkerSender Thread] DEBUG > org.apache.zookeeper.server.quorum.QuorumCnxManager - Opening channel > to server 215 > 2661 [Thread-3] DEBUG > org.apache.zookeeper.server.quorum.QuorumCnxManager - Connection > request /13.0.0.12:60623 > 2661 [Thread-3] DEBUG > org.apache.zookeeper.server.quorum.QuorumCnxManager - Connection > request: 37 > 2661 [WorkerSender Thread] INFO > org.apache.zookeeper.server.quorum.QuorumCnxManager - Have smaller > server identifier, so dropping the connection: (215, 37) > 2662 [Thread-3] DEBUG > org.apache.zookeeper.server.quorum.QuorumCnxManager - Address of > remote peer: 126 > 2668 [Thread-3] DEBUG > org.apache.zookeeper.server.quorum.QuorumCnxManager - Connection > request /13.0.0.11:60190 > 2668 [Thread-3] DEBUG > org.apache.zookeeper.server.quorum.QuorumCnxManager - Connection > request: 37 > 2668 [Thread-3] DEBUG > org.apache.zookeeper.server.quorum.QuorumCnxManager - Address of > remote peer: 215 > 2669 [WorkerReceiver Thread] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new > notification message. My id = 37 > 2670 [WorkerReceiver Thread] INFO > org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: > 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), LOOKING (n.state), > 126 (n.sid), LOOKING (my state) > 2670 [QuorumPeer:/0.0.0.0:2888] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - id: 215, > proposed id: 37, zxid: 12884902548, proposed zxid: 17179869831 > 2670 [WorkerReceiver Thread] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new > notification message. My id = 37 > 2670 [QuorumPeer:/0.0.0.0:2888] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - Adding vote: > From = 126, Proposed leader = 215, Porposed zxid = 12884902548, > Proposed epoch = 3 > 2670 [WorkerReceiver Thread] INFO > org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: > 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), FOLLOWING > (n.state), 126 (n.sid), LOOKING (my state) > 2671 [WorkerSender Thread] DEBUG > org.apache.zookeeper.server.quorum.QuorumCnxManager - There is a > connection already for server 126 > 2671 [WorkerSender Thread] DEBUG > org.apache.zookeeper.server.quorum.QuorumCnxManager - There is a > connection already for server 215 > 2671 [WorkerReceiver Thread] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new > notification message. My id = 37 > 2671 [WorkerReceiver Thread] INFO > org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: > 37 (n.leader), 17179869831 (n.zxid), 3 (n.round), LOOKING (n.state), > 37 (n.sid), LOOKING (my state) > 2671 [WorkerReceiver Thread] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new > notification message. My id = 37 > 2671 [WorkerReceiver Thread] INFO > org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: > 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), LOOKING (n.state), > 215 (n.sid), LOOKING (my state) > 2671 [QuorumPeer:/0.0.0.0:2888] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - id: 37, > proposed id: 37, zxid: 17179869831, proposed zxid: 17179869831 > 2672 [WorkerReceiver Thread] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new > notification message. My id = 37 > 2672 [QuorumPeer:/0.0.0.0:2888] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - Adding vote: > From = 37, Proposed leader = 37, Porposed zxid = 17179869831, Proposed > epoch = 3 > 2672 [WorkerReceiver Thread] INFO > org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: > 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), LEADING (n.state), > 215 (n.sid), LOOKING (my state) > 2672 [QuorumPeer:/0.0.0.0:2888] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - id: 215, > proposed id: 37, zxid: 12884902548, proposed zxid: 17179869831 > 2672 [QuorumPeer:/0.0.0.0:2888] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - Adding vote: > From = 215, Proposed leader = 215, Porposed zxid = 12884902548, > Proposed epoch = 3 > 2672 [WorkerReceiver Thread] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new > notification message. My id = 37 > 2672 [WorkerReceiver Thread] INFO > org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: > 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), FOLLOWING > (n.state), 126 (n.sid), LEADING (my state) > 2673 [QuorumPeer:/0.0.0.0:2888] INFO > org.apache.zookeeper.server.quorum.QuorumPeer - LEADING > 2675 [QuorumPeer:/0.0.0.0:2888] INFO > org.apache.zookeeper.server.quorum.Leader - TCP NoDelay set to: true > 2693 [WorkerReceiver Thread] DEBUG > org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new > notification message. My id = 37 I don't know much about the internal workings of Zookeeper, but what I find weird here is that node #3 proposes zxid 17179869831 (0x400000287), which seems to indicate that it wants to start well into epoch 4 before it even opens channels to other nodes, finding out that their zxids are only 12884902548 (0x300000294). Is it possible that node #3 thought its zxid was not less than the other nodes' zxids, and so did not bother transferring the operations it missed? Any other ideas on what could have gone wrong? I dont see any other obvious exceptions in node #3's log, except: > 2487 [pool-1-thread-2] DEBUG > org.apache.zookeeper.server.persistence.FileTxnLog - EOF excepton > java.io.EOFException: Failed to read but that seems to happen all the time, and it's at DEBUG level, so I'm guessing it's expected. Thanks, Jeremy