zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dudzinski, Karol" <Karol.Dudzin...@gs.com>
Subject RE: Leader election duration
Date Fri, 08 May 2015 16:06:56 GMT
I've dug out the logs from 4 and 5 as requested.  I'm not entirely sure which of them are necessarily for the same election as the timestamps don't line up completely so I've given you the leader election before and after also.

One thing that strikes me while looking through these is that it appears the different servers think they've completed election at completely somewhat different times (number of seconds out of sync).  That could be perfectly acceptable of course but I'm not sure.

As before I've excluded lots of logs about clients connecting and not establishing sessions because of "ZooKeeperServer not running" and also logs about not being able to open channel to the server that's currently rebooting.  I've also trimmed some of the stack traces to make this a bit more readable.

There are lots of log lines below, would it be more beneficial/efficient if I zipped up the logs (somewhat sanitized to remove irrelevant things such as client connections and prioriatary logs) and put them on a JIRA perhaps?  They would cover the 40 or so elections that happened back to back.

Thanks,
Karol

Here are the logs from 4:
2015-04-25 12:01:36,521 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:QuorumPeer@670] - LOOKING
2015-04-25 12:01:36,523 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FileSnap@83] - Reading snapshot <snapshot>
2015-04-25 12:01:41,142 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FastLeaderElection@740] - New election. My id =  4, proposed zxid=0x41d000c7d27
2015-04-25 12:01:41,143 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:41,359 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 400
2015-04-25 12:01:41,360 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:41,769 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 800
2015-04-25 12:01:41,770 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:42,023 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:42,025 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:42,248 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:42,659 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:43,469 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 1600
2015-04-25 12:01:43,469 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:43,469 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:45,078 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 3200
2015-04-25 12:01:48,288 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 6400
2015-04-25 12:01:48,711 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 1 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:50,079 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:51,695 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:53,710 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:53,747 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:55,079 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:56,694 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:58,710 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:01:58,744 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:00,077 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:00,287 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:QuorumPeer@738] - FOLLOWING
2015-04-25 12:02:00,287 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:ZooKeeperServer@162] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 100000 datadir <datadir> snapdir <snapdir>
2015-04-25 12:02:00,287 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 23766
2015-04-25 12:02:00,375 - WARN  [QuorumPeer[myid=4]/0.0.0.0:2181:Learner@232] - Unexpected exception, tries=0, connecting to <server.id=5>/<ip>:2182
java.net.ConnectException: Connection refused
...
2015-04-25 12:02:03,713 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), FOLLOWING (my state)
2015-04-25 12:02:03,752 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), FOLLOWING (my state)
2015-04-25 12:02:08,709 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), FOLLOWING (my state)
2015-04-25 12:02:08,760 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), FOLLOWING (my state)
2015-04-25 12:02:10,322 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), FOLLOWING (my state)
2015-04-25 12:02:13,708 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), FOLLOWING (my state)
2015-04-25 12:02:13,758 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), FOLLOWING (my state)
2015-04-25 12:02:15,331 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), FOLLOWING (my state)
2015-04-25 12:02:18,709 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), FOLLOWING (my state)
2015-04-25 12:02:19,180 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FileTxnSnapLog@240] - Snapshotting: 0x41d000c7d27 to <snapshot>
2015-04-25 12:02:20,346 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), FOLLOWING (my state)
2015-04-25 12:02:23,210 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FileTxnSnapLog@240] - Snapshotting: 0x41d000c7d27 to <snapshot>
2015-04-25 12:02:23,719 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 1 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), FOLLOWING (my state)
2015-04-25 12:02:27,289 - WARN  [QuorumPeer[myid=4]/0.0.0.0:2181:Follower@89] - Exception when following the leader
java.io.EOFException
...
2015-04-25 12:02:27,290 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
2015-04-25 12:02:27,290 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FollowerZooKeeperServer@139] - Shutting down
2015-04-25 12:02:27,290 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:ZooKeeperServer@419] - shutting down
2015-04-25 12:02:27,291 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FollowerRequestProcessor@105] - Shutting down
2015-04-25 12:02:27,291 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:CommitProcessor@181] - Shutting down
2015-04-25 12:02:27,291 - INFO  [FollowerRequestProcessor:4:FollowerRequestProcessor@95] - FollowerRequestProcessor exited loop!
2015-04-25 12:02:27,291 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FinalRequestProcessor@415] - shutdown of request processor complete
2015-04-25 12:02:27,291 - INFO  [CommitProcessor:4:CommitProcessor@150] - CommitProcessor exited loop!
2015-04-25 12:02:27,291 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:SyncRequestProcessor@175] - Shutting down
2015-04-25 12:02:27,292 - INFO  [SyncThread:4:SyncRequestProcessor@155] - SyncRequestProcessor exited!
2015-04-25 12:02:27,292 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:QuorumPeer@670] - LOOKING
2015-04-25 12:02:27,296 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FileSnap@83] - Reading snapshot <snapshot>
2015-04-25 12:02:29,337 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 1 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:30,361 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:31,497 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FastLeaderElection@740] - New election. My id =  4, proposed zxid=0x41d000c7d27
2015-04-25 12:02:31,713 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 400
2015-04-25 12:02:32,123 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 800
2015-04-25 12:02:32,933 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 1600
2015-04-25 12:02:33,728 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:35,343 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 3200
2015-04-25 12:02:35,359 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:36,504 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:38,729 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:39,477 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:40,358 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:41,504 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:43,727 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:44,475 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:45,358 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:46,502 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:48,726 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:49,484 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:51,502 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:53,725 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:54,492 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:56,501 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:58,725 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 1 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:02:59,489 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:03:01,501 - INFO  [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 12:03:01,712 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:QuorumPeer@738] - FOLLOWING
2015-04-25 12:03:01,712 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:ZooKeeperServer@162] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 100000 datadir <datadir> snapdir <snapdir>
2015-04-25 12:03:01,712 - INFO  [QuorumPeer[myid=4]/0.0.0.0:2181:Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 34420

Here are the logs from 5 (which is in a different timezone, 5hrs out, I suspect that's not the issue as it has been like this for 5 years or so and this only started happening 3 or 4 weeks ago):
2015-04-25 17:01:35,994 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:QuorumPeer@670] - LOOKING
2015-04-25 17:01:35,998 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FileSnap@83] - Reading snapshot <snapshot>
2015-04-25 17:01:42,021 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:42,021 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:42,021 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:42,142 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:42,142 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:42,642 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:42,776 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:45,270 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:45,270 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:48,620 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FastLeaderElection@740] - New election. My id =  5, proposed zxid=0x41d000c7d27
2015-04-25 17:01:48,831 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 400
2015-04-25 17:01:48,832 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 1 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:49,247 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 800
2015-04-25 17:01:50,057 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 1600
2015-04-25 17:01:50,209 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:51,813 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:53,426 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 3200
2015-04-25 17:01:53,617 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:53,828 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:55,209 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:56,812 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:58,615 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:58,828 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:01:58,862 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:00,206 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:00,423 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:QuorumPeer@750] - LEADING
2015-04-25 17:02:00,427 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:Leader@58] - TCP NoDelay set to: true
2015-04-25 17:02:00,428 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:ZooKeeperServer@162] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 100000 datadir <datadir> snapdir <snapdir>
2015-04-25 17:02:00,429 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:Leader@345] - LEADING - LEADER ELECTION TOOK - 24435
2015-04-25 17:02:00,433 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FileSnap@83] - Reading snapshot <snapshot>
2015-04-25 17:02:03,623 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:03,829 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:03,883 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:08,630 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:09,009 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:09,020 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:09,548 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FileTxnSnapLog@240] - Snapshotting: 0x41d000c7d27 to <snapshot>
2015-04-25 17:02:10,441 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:13,628 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:13,629 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:13,629 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:13,629 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:13,827 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:13,888 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:14,158 - INFO  [LearnerHandler-/<ip of server 4>:46108:LearnerHandler@263] - Follower sid: 4 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@5c4b8582
2015-04-25 17:02:15,457 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:18,826 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:18,828 - INFO  [LearnerHandler-/<ip of server 1>:51827:LearnerHandler@263] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@4b13c1cf
2015-04-25 17:02:19,134 - INFO  [LearnerHandler-/<ip of server 4>:46108:LearnerHandler@318] - Synchronizing with Follower sid: 4 maxCommittedLog=0x41d000c7d27 minCommittedLog=0x41d000c7b33 peerLastZxid=0x41d000c7d27
2015-04-25 17:02:19,134 - INFO  [LearnerHandler-/<ip of server 4>:46108:LearnerHandler@395] - Sending DIFF
2015-04-25 17:02:19,135 - INFO  [LearnerHandler-/<ip of server 1>:51827:LearnerHandler@318] - Synchronizing with Follower sid: 1 maxCommittedLog=0x41d000c7d27 minCommittedLog=0x41d000c7b33 peerLastZxid=0x41d000c7d27
2015-04-25 17:02:19,136 - INFO  [LearnerHandler-/<ip of server 1>:51827:LearnerHandler@395] - Sending DIFF
2015-04-25 17:02:20,460 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:23,571 - WARN  [LearnerHandler-/<ip of server 1>:51827:Leader@574] - Commiting zxid 0x41e00000000 from /<ip of server 5>:2182 not first!
2015-04-25 17:02:23,571 - WARN  [LearnerHandler-/<ip of server 1>:51827:Leader@576] - First is 0x0
2015-04-25 17:02:23,571 - INFO  [LearnerHandler-/<ip of server 1>:51827:Leader@598] - Have quorum of supporters; starting up and setting last processed zxid: 0x41e00000000
2015-04-25 17:02:23,838 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 1 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:25,473 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LEADING (my state)
2015-04-25 17:02:27,242 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:Leader@490] - Shutting down
2015-04-25 17:02:27,243 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:Leader@496] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
    at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
    at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
2015-04-25 17:02:27,243 - INFO  [Thread-1203:Leader$LearnerCnxAcceptor@309] - exception while shutting down acceptor: java.net.SocketException: Socket closed
2015-04-25 17:02:27,244 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:ZooKeeperServer@419] - shutting down
2015-04-25 17:02:27,244 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:SessionTrackerImpl@225] - Shutting down
2015-04-25 17:02:27,244 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:PrepRequestProcessor@743] - Shutting down
2015-04-25 17:02:27,244 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:ProposalRequestProcessor@88] - Shutting down
2015-04-25 17:02:27,245 - INFO  [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
2015-04-25 17:02:27,245 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:CommitProcessor@181] - Shutting down
2015-04-25 17:02:27,245 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:Leader$ToBeAppliedRequestProcessor@655] - Shutting down
2015-04-25 17:02:27,245 - INFO  [CommitProcessor:5:CommitProcessor@150] - CommitProcessor exited loop!
2015-04-25 17:02:27,245 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FinalRequestProcessor@415] - shutdown of request processor complete
2015-04-25 17:02:27,245 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:SyncRequestProcessor@175] - Shutting down
2015-04-25 17:02:27,246 - INFO  [SyncThread:5:SyncRequestProcessor@155] - SyncRequestProcessor exited!
2015-04-25 17:02:27,247 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:QuorumPeer@670] - LOOKING
2015-04-25 17:02:27,247 - WARN  [LearnerHandler-/<ip of server 1>:51827:LearnerHandler@575] - ******* GOODBYE /<ip of server 1>:51827 ********
2015-04-25 17:02:27,247 - WARN  [LearnerHandler-/<ip of server 4>:46108:LearnerHandler@575] - ******* GOODBYE /<ip of server 4>:46108 ********
2015-04-25 17:02:27,247 - WARN  [LearnerHandler-/<ip of server 1>:51827:LearnerHandler@587] - Ignoring unexpected exception
java.lang.InterruptedException
...
2015-04-25 17:02:27,247 - WARN  [LearnerHandler-/<ip of server 4>:46108:LearnerHandler@587] - Ignoring unexpected exception
java.lang.InterruptedException
...
2015-04-25 17:02:27,250 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FileSnap@83] - Reading snapshot <snapshot>
2015-04-25 17:02:28,012 - INFO  [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop!
2015-04-25 17:02:29,222 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 1 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:32,603 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:32,603 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:33,846 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:34,350 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FastLeaderElection@740] - New election. My id =  5, proposed zxid=0x41d000c7d27
2015-04-25 17:02:34,559 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 400
2015-04-25 17:02:34,969 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 800
2015-04-25 17:02:35,478 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:36,288 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 1600
2015-04-25 17:02:36,634 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:38,248 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 3200
2015-04-25 17:02:38,845 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:39,348 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:40,478 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:41,643 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:43,845 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:44,345 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:45,475 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 2 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:46,632 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:48,844 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:49,355 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:51,632 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:53,844 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:54,362 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:56,630 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:58,843 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 1 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:02:59,359 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:03:01,630 - INFO  [WorkerReceiver[myid=5]:FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
2015-04-25 17:03:01,838 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:QuorumPeer@750] - LEADING
2015-04-25 17:03:01,838 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:ZooKeeperServer@162] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 100000 datadir <datadir> snapdir <snapdir>
2015-04-25 17:03:01,838 - INFO  [QuorumPeer[myid=5]/0.0.0.0:2181:Leader@345] - LEADING - LEADER ELECTION TOOK - 34591

The Goldman Sachs Group, Inc. All rights reserved.
See http://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest and other terms and conditions relating to this e-mail and your reliance on information contained in it.  This message may contain confidential or privileged information.  If you are not the intended recipient, please advise us immediately and delete this message.  See http://www.gs.com/disclaimer/email for further information on confidentiality and the risks of non-secure electronic communication.  If you cannot access these links, please notify us by reply message and we will send the contents to you.


-----Original Message-----
From: Camille Fournier [mailto:camille@apache.org] 
Sent: 01 May 2015 22:10
To: user@zookeeper.apache.org
Subject: Re: Leader election duration

OK so it looks to me like what happens is this:
Server 5 proposes being leader in round 6. Servers 4 and 1 vote to follow,
5 believes it should be come leader, and goes into LEADER state. Server 4 goes into FOLLOWER state. But, in between this, server 2 comes by, and says that the vote should be in round 7. Server 1 sees this, throws away its vote, updates itself to round 7. So servers 4 and 5 think that 5 is the leader, servers 1 and 2 are looking for a leader. Finally, 20s later, the leader, 5 realizes that it cannot actually lead, and goes back to looking, at which point we finally start to coalesce.

Server 5 thinks it is leading here:
2015-04-25 12:02:25,598 - INFO  [WorkerReceiver[myid=2]:
FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING (n.state), 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)

And then finally just looking here:
2015-04-25 12:02:34,476 - INFO  [WorkerReceiver[myid=2]:
FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)

This seems like an annoying race condition but I'm not 100% sure what to do about it or whether it is an actual bug or not... if anyone with FLE experience more than just squinting at the code has an idea, I'm all ears.

C


On Fri, May 1, 2015 at 12:30 PM, Camille Fournier <camille@apache.org>
wrote:

> The other thing I notice is that we're first off in the n.round, and 
> then we're off in the peer epoch, the nodes that thought the round was 
> 6 (4 and 5), upon increasing the round to 7 increase their peer epoch 
> from 41d to 41e. Only when all the nodes finally say both 7 and 41e do 
> you see election claim to finish.
>
> C
>
> On Fri, May 1, 2015 at 12:22 PM, Camille Fournier <camille@apache.org>
> wrote:
>
> > Unfortunately that looks like just zab. It would be super awesome if 
> > you could send me one more log snippet of one of the other machines 
> > in this
> bad
> > scenario. I think that will help me figure out what's happening. 
> > Perhaps machine 4 or 5?
> >
> > Thanks,
> > C
> >
> > On Fri, May 1, 2015 at 12:16 PM, Dudzinski, Karol <
> Karol.Dudzinski@gs.com>
> > wrote:
> >
> >> That is awesome.  I was looking for a page like this that describes 
> >> the election process.  I'll have a read through it.
> >>
> >> The Goldman Sachs Group, Inc. All rights reserved.
> >> See http://www.gs.com/disclaimer/global_email for important risk 
> >> disclosures, conflicts of interest and other terms and conditions
> relating
> >> to this e-mail and your reliance on information contained in it.  
> >> This message may contain confidential or privileged information.  
> >> If you are
> not
> >> the intended recipient, please advise us immediately and delete 
> >> this message.  See http://www.gs.com/disclaimer/email for further
> information
> >> on confidentiality and the risks of non-secure electronic communication.
> >> If you cannot access these links, please notify us by reply message 
> >> and
> we
> >> will send the contents to you.
> >>
> >>
> >> -----Original Message-----
> >> From: Camille Fournier [mailto:camille@apache.org]
> >> Sent: 01 May 2015 17:08
> >> To: bookkeeper-user@zookeeper.apache.org
> >> Subject: Re: Leader election duration
> >>
> >> Election epoch.... something in here I hope?
> >> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
> >>
> >> On Fri, May 1, 2015 at 12:01 PM, Dudzinski, Karol <
> Karol.Dudzinski@gs.com
> >> >
> >> wrote:
> >>
> >> > Yes, that is odd.  What does the round mean?
> >> >
> >> > The Goldman Sachs Group, Inc. All rights reserved.
> >> > See http://www.gs.com/disclaimer/global_email for important risk 
> >> > disclosures, conflicts of interest and other terms and conditions 
> >> > relating to this e-mail and your reliance on information 
> >> > contained in it.  This message may contain confidential or privileged information.
> >> > If you are not the intended recipient, please advise us 
> >> > immediately and delete this message.  See 
> >> > http://www.gs.com/disclaimer/email for further information on 
> >> > confidentiality and the risks of non-secure
> >> electronic communication.
> >> > If you cannot access these links, please notify us by reply 
> >> > message and we will send the contents to you.
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: Camille Fournier [mailto:camille@apache.org]
> >> > Sent: 01 May 2015 16:53
> >> > To: bookkeeper-user@zookeeper.apache.org
> >> > Subject: Re: Leader election duration
> >> >
> >> > One thing that jumps out at me here is that a lot of these 
> >> > messages are from different rounds. Some say they are in round 6, 
> >> > and some say they are in round 7.
> >> >
> >> > 2015-04-25 12:02:20,585 - INFO  [WorkerReceiver[myid=2]:
> >> > FastLeaderElection@542] - Notification: 5 (n.leader), 
> >> > 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING (n.state), 5 
> >> > (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > 2015-04-25 12:02:23,723 - INFO  [WorkerReceiver[myid=2]:
> >> > FastLeaderElection@542] - Notification: 1 (n.leader), 
> >> > 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 
> >> > (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> >
> >> > So that's a bit odd.
> >> >
> >> > On Fri, May 1, 2015 at 10:26 AM, Dudzinski, Karol 
> >> > <Karol.Dudzinski@gs.com>
> >> > wrote:
> >> >
> >> > > Unfortunately, I can't reproduce this other than in one 
> >> > > production cluster so testing a patch is difficult.
> >> > >
> >> > > Below are the relevant logs that Camille requested.
> >> > >
> >> > > For the first issue, the leader selection taking a long time, 
> >> > > this is a typical example:
> >> > > 2015-04-25 12:01:59,734 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :QuorumPeer@670] - LOOKING
> >> > > 2015-04-25 12:01:59,737 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :FileSnap@83] - Reading 
> >> > > snapshot <snapshot>
> >> > > 2015-04-25 12:02:05,203 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:05,327 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :FastLeaderElection@740] - New 
> >> > > election. My id =  2, proposed
> >> > > zxid=0x41d000c7d27
> >> > > 2015-04-25 12:02:05,327 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:05,328 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:05,530 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :FastLeaderElection@774] - 
> >> > > Notification time out: 400
> >> > > 2015-04-25 12:02:05,940 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :FastLeaderElection@774] - 
> >> > > Notification time out: 800
> >> > > 2015-04-25 12:02:06,750 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :FastLeaderElection@774] - 
> >> > > Notification time out: 1600
> >> > > 2015-04-25 12:02:08,359 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :FastLeaderElection@774] - 
> >> > > Notification time out: 3200
> >> > > 2015-04-25 12:02:08,709 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:08,756 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:10,320 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:10,325 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING 
> >> > > (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:13,539 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :FastLeaderElection@774] - 
> >> > > Notification time out: 6400
> >> > > 2015-04-25 12:02:13,707 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:13,755 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING 
> >> > > (n.state),
> >> > > 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:15,329 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:15,334 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING 
> >> > > (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:15,582 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING 
> >> > > (n.state),
> >> > > 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:18,709 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 1 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:20,339 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:20,350 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING 
> >> > > (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:20,585 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING 
> >> > > (n.state),
> >> > > 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:23,723 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 1 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:25,349 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:25,357 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING 
> >> > > (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:25,598 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING 
> >> > > (n.state),
> >> > > 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:28,729 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:30,358 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:31,500 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:33,726 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:34,476 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:35,357 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 2 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:36,506 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:38,726 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:39,473 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:40,357 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 2 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:41,507 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:43,726 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:44,472 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:46,504 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:48,725 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 2 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:49,480 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:51,504 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:53,724 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 1 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 1 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:54,488 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:56,514 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 5 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state)
> >> > > 2015-04-25 12:02:56,734 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :QuorumPeer@738] - FOLLOWING
> >> > > 2015-04-25 12:02:56,734 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :ZooKeeperServer@162] - 
> >> > > Created server with tickTime 2000 minSessionTimeout 4000 
> >> > > maxSessionTimeout 100000 datadir <data dir> snapdir <snap dir>
> >> > > 2015-04-25 12:02:56,734 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :Follower@63] - FOLLOWING - 
> >> > > LEADER ELECTION TOOK - 56999
> >> > >
> >> > > These are logs from server with id 2, they are following the 
> >> > > reboot of the leader which was sid 3.
> >> > >
> >> > >
> >> > > For the second problem, where the sync'ing doesn't happen, here 
> >> > > are the follower logs (immediately after the election shown above):
> >> > > 2015-04-25 12:02:56,818 - WARN  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :Learner@232] - Unexpected 
> >> > > exception, tries=0, connecting to <host with id 5>/<ip>:2182
> >> > > java.net.ConnectException: Connection refused
> >> > >         at java.net.PlainSocketImpl.socketConnect(Native Method)
> >> > >         at
> >> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
> >> > >         at
> >> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
> >> > >         at
> java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
> >> > >         at
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> >> > >         at java.net.Socket.connect(Socket.java:529)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learne
> >> > r.jav
> >> > a:224)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower
> >> > .java
> >> > :71)
> >> > >         at
> >> > >
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > > 2015-04-25 12:02:57,906 - WARN  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :Learner@232] - Unexpected 
> >> > > exception, tries=1, connecting to <host with id 5>/<ip>:2182
> >> > > java.net.ConnectException: Connection refused
> >> > >         at java.net.PlainSocketImpl.socketConnect(Native Method)
> >> > >         at
> >> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
> >> > >         at
> >> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
> >> > >         at
> java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
> >> > >         at
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> >> > >         at java.net.Socket.connect(Socket.java:529)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learne
> >> > r.jav
> >> > a:224)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower
> >> > .java
> >> > :71)
> >> > >         at
> >> > >
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > > 2015-04-25 12:02:58,725 - INFO
> >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 
> >> > > 4 (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING 
> >> > > (n.state),
> >> > > 1 (n.sid), 0x41e (n.peerEPoch), FOLLOWING (my state)
> >> > > 2015-04-25 12:02:58,996 - WARN  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :Learner@232] - Unexpected 
> >> > > exception, tries=2, connecting to <host with id 5>/<ip>:2182
> >> > > java.net.ConnectException: Connection refused
> >> > >         at java.net.PlainSocketImpl.socketConnect(Native Method)
> >> > >         at
> >> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
> >> > >         at
> >> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
> >> > >         at
> java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
> >> > >         at
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> >> > >         at java.net.Socket.connect(Socket.java:529)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learne
> >> > r.jav
> >> > a:224)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower
> >> > .java
> >> > :71)
> >> > >         at
> >> > >
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > > 2015-04-25 12:03:00,093 - WARN  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :Learner@232] - Unexpected 
> >> > > exception, tries=3, connecting to <host with id 5>/<ip>:2182
> >> > > java.net.ConnectException: Connection refused
> >> > >         at java.net.PlainSocketImpl.socketConnect(Native Method)
> >> > >         at
> >> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
> >> > >         at
> >> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
> >> > >         at
> java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
> >> > >         at
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> >> > >         at java.net.Socket.connect(Socket.java:529)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learne
> >> > r.jav
> >> > a:224)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower
> >> > .java
> >> > :71)
> >> > >         at
> >> > >
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > >         at java.lang.Thread.run(Thread.java:662)
> >> > > 2015-04-25 12:03:01,185 - ERROR 
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :Learner@229] - Unexpected 
> >> > > exception
> >> > > java.net.ConnectException: Connection refused
> >> > >         at java.net.PlainSocketImpl.socketConnect(Native Method)
> >> > >         at
> >> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
> >> > >         at
> >> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
> >> > >         at
> java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
> >> > >         at
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> >> > >         at java.net.Socket.connect(Socket.java:529)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learne
> >> > r.jav
> >> > a:224)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower
> >> > .java
> >> > :71)
> >> > >         at
> >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.ja
> >> > > va:74
> >> > > 0)
> >> > > 2015-04-25 12:03:01,186 - WARN  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :Follower@89] - Exception when 
> >> > > following the leader
> >> > > java.net.ConnectException: Connection refused
> >> > >         at java.net.PlainSocketImpl.socketConnect(Native Method)
> >> > >         at
> >> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
> >> > >         at
> >> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
> >> > >         at
> java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
> >> > >         at
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> >> > >         at java.net.Socket.connect(Socket.java:529)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learne
> >> > r.jav
> >> > a:224)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower
> >> > .java
> >> > :71)
> >> > >         at
> >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.ja
> >> > > va:74
> >> > > 0)
> >> > > 2015-04-25 12:03:01,186 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :Follower@166] - shutdown 
> >> > > called
> >> > > java.lang.Exception: shutdown Follower
> >> > >         at
> >> > >
> >> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:
> >> 166)
> >> > >         at
> >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.ja
> >> > > va:74
> >> > > 4)
> >> > > 2015-04-25 12:03:01,187 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :FollowerZooKeeperServer@139] 
> >> > > - Shutting down
> >> > > 2015-04-25 12:03:01,187 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :ZooKeeperServer@419] - 
> >> > > shutting down
> >> > > 2015-04-25 12:03:01,187 - INFO  
> >> > > [QuorumPeer[myid=2]/0.0.0.0:2181 :QuorumPeer@670] - LOOKING
> >> > >
> >> > > I think they get rejected because as far as sid 5 is concerned, 
> >> > > it's not the leader yet as shown by these two lines:
> >> > > 2015-04-25 12:02:27,247 - INFO  
> >> > > [QuorumPeer[myid=5]/0.0.0.0:2181 :QuorumPeer@670] - LOOKING
> >> > > 2015-04-25 12:03:01,838 - INFO  
> >> > > [QuorumPeer[myid=5]/0.0.0.0:2181 :Leader@345] - LEADING - 
> >> > > LEADER ELECTION TOOK - 34591
> >> > >
> >> > > A short while later the leader times out:
> >> > > 2015-04-25 12:03:21,850 - WARN  
> >> > > [QuorumPeer[myid=5]/0.0.0.0:2181 :QuorumPeer@756] - Unexpected 
> >> > > exception
> >> > > java.lang.InterruptedException: Timeout while waiting for epoch 
> >> > > from
> >> > quorum
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leade
> >> > r.jav
> >> > a:872)
> >> > >         at
> >> > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:364)
> >> > >         at
> >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.ja
> >> > > va:75
> >> > > 3)
> >> > >
> >> > > During another election, the leader sees this:
> >> > > 2015-04-25 17:03:55,967 - INFO  
> >> > > [QuorumPeer[myid=5]/0.0.0.0:2181 :Leader@345] - LEADING - 
> >> > > LEADER ELECTION TOOK - 34114
> >> > > 2015-04-25 17:04:06,034 - INFO
> >> > > [LearnerHandler-/<ip>:34474:LearnerHandler@263] - Follower sid: 4 :
> >> > > info
> >> > > :
> >> > > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@5c4b
> >> > > 8582
> >> > >
> >> > > Sometimes this includes all the expected followers, sometimes 
> >> > > just one or two.  The problem doesn't end there, even for the 
> >> > > follower for which it prints the line above, it often never 
> >> > > gets to the line below which I see in successful elections 
> >> > > (this is an example from a clean cluster I just spun up for testing):
> >> > > 2015-04-29 11:05:55,941 - INFO
> >> > > [LearnerHandler-/<ip>:46737:LearnerHandler@385] - Synchronizing
> with
> >> > > Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0
> >> > > peerLastZxid=0x0
> >> > >
> >> > > Karol
> >> > >
> >> > > The Goldman Sachs Group, Inc. All rights reserved.
> >> > > See http://www.gs.com/disclaimer/global_email for important 
> >> > > risk disclosures, conflicts of interest and other terms and 
> >> > > conditions relating to this e-mail and your reliance on 
> >> > > information contained in it.  This message may contain 
> >> > > confidential or privileged
> >> information.
> >> > > If you are not the intended recipient, please advise us 
> >> > > immediately and delete this message.  See 
> >> > > http://www.gs.com/disclaimer/email
> for
> >> > > further information on confidentiality and the risks of 
> >> > > non-secure
> >> > electronic communication.
> >> > > If you cannot access these links, please notify us by reply 
> >> > > message and we will send the contents to you.
> >> > >
> >> > >
> >> > > -----Original Message-----
> >> > > From: Flavio Junqueira [mailto:fpjunqueira@yahoo.com.INVALID]
> >> > > Sent: 29 April 2015 22:54
> >> > > To: user@zookeeper.apache.org
> >> > > Subject: Re: Leader election duration
> >> > >
> >> > > 170MB isn't that large. I'm thinking that the election might be 
> >> > > taking some time because of the issue described in ZK-2164. In 
> >> > > that jira, the election was failing, but I can see the blocking 
> >> > > call causing the election to take longer as well. You may want 
> >> > > to test that fix and see if it makes any difference to you.
> >> > >
> >> > > -Flavio
> >> > >
> >> > > > On 29 Apr 2015, at 19:05, Karol Dudzinski 
> >> > > > <karoldudzinski@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > > The snapshots are a bit larger than I'd like (~170MB) but the 
> >> > > > bit that
> >> > > takes time is actually before that.
> >> > > >
> >> > > > On a successful election, on the new leader I see the following:
> >> > > > - LEADER ELECTION TOOK: 12345
> >> > > > - Synchronizing with Follower...
> >> > > > - Sending DIFF or SNAP
> >> > > >
> >> > > > That last bit will take a long time if the snapshot is large,
> >> agreed.
> >> > > However, for me it's arriving at the LEADER ELECTION TOOK bit 
> >> > > that takes a long time.  And the bit between that and the 
> >> > > Synchronizing often times out with the messages I described in my previous email.
> >> > > >
> >> > > > Karol
> >> > > >
> >> > > >> On 29 Apr 2015, at 18:46, Flavio Junqueira
> >> > > <fpjunqueira@yahoo.com.INVALID> wrote:
> >> > > >>
> >> > > >> Check the size of your snapshot files. If during election 
> >> > > >> you need to
> >> > > transfer snapshots and they are large, then recovery is going 
> >> > > to take longer. We try to send a diff of the txns when 
> >> > > possible, and sending a diff is much faster compared to send a full snapshot.
> >> > > Sending a diff isn't always possible, depending on how far 
> >> > > behind followers compared to the leader, so the difference 
> >> > > might be due to
> >> snapshots and diffs.
> >> > > >> -Flavio
> >> > > >>
> >> > > >>
> >> > > >>    On Wednesday, April 29, 2015 6:32 PM, Camille Fournier <
> >> > > camille@apache.org> wrote:
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >> Don't suppose you could share some snippets of the log 
> >> > > >> messages you're seeing a lot?
> >> > > >>
> >> > > >> On Wed, Apr 29, 2015 at 1:29 PM, Karol Dudzinski 
> >> > > >> <karoldudzinski@gmail.com>
> >> > > >> wrote:
> >> > > >>
> >> > > >>> Did a bit more investigation today though I didn't try the 
> >> > > >>> prod snapshot in a test cluster.  I did however find a 
> >> > > >>> couple of other strange things which I'd initially missed.
> >> > > >>>
> >> > > >>> When the leader of this particular cluster goes for a 
> >> > > >>> reboot, I see many notification printouts from 
> >> > > >>> FastLeaderElection.  On some occasions I've seen nearly 50.  
> >> > > >>> In other clusters I usually see 5 or thereabouts.  The 
> >> > > >>> number of these notifications seems to correlate with how 
> >> > > >>> long election took.  What could cause it to require
> >> > > exchanging so many messages?
> >> > > >>>
> >> > > >>> This cluster also seems to often timeout while syncing 
> >> > > >>> after selecting the leader.  On the new leader I see the 
> >> > > >>> expected number of
> >> > > "Follower sid: ..."
> >> > > >>> messages but it doesn't always get to the "Synchronizing 
> >> > > >>> with
> >> > Follower"
> >> > > >>> part.  On the leader I then get a "Unexpected exception 
> >> > > >>> causing shutdown while sock still open" message followed by 
> >> > > >>> a GOODBYE for one of the followers.  At the same time on 
> >> > > >>> that follower I see a "Exception when following the leader" 
> >> > > >>> caused by a read timeout in Learner.registerWithLeader.  
> >> > > >>> Does anything happen prior to syncing that could take a long time?
> >> > > >>>
> >> > > >>> Thanks,
> >> > > >>> Karol
> >> > > >>>
> >> > > >>>>> On 28 Apr 2015, at 20:22, Karol Dudzinski 
> >> > > >>>>> <karoldudzinski@gmail.com>
> >> > > >>>> wrote:
> >> > > >>>>
> >> > > >>>> Well these are prod clusters so my ability to experiment 
> >> > > >>>> is rather
> >> > > >>> limited.  I can take a copy of the snapshot and try both 3 
> >> > > >>> node and
> >> > > >>> 5 in a test cluster.
> >> > > >>>>
> >> > > >>>> One thing I forgot to mention is that in most clusters the 
> >> > > >>>> number of
> >> > > >>> election notification log lines I see is typically, give or 
> >> > > >>> take, the same as the number of participants.  In this 
> >> > > >>> cluster however, it's typically 2 or 3 times as many 
> >> > > >>> notifications as the number of
> >> > > participants.
> >> > > >>>>
> >> > > >>>> My gut feeling is it's more likely to be due to load as 
> >> > > >>>> the 5 node
> >> > > >>> cluster is much busier and the election time has been 
> >> > > >>> increasing over time (as has load).  I have no idea exactly 
> >> > > >>> what load though, whether it's number of clients, frequency 
> >> > > >>> of transactions, total data size, etc.  I don't understand 
> >> > > >>> why though but that may just be my limited knowledge of the 
> >> > > >>> election
> >> protocol.
> >> > > >>>>
> >> > > >>>> Karol
> >> > > >>>>
> >> > > >>>>> On 28 Apr 2015, at 19:54, Camille Fournier 
> >> > > >>>>> <camille@apache.org>
> >> > > wrote:
> >> > > >>>>>
> >> > > >>>>> Just out of curiosity, if you start the 5 node cluster up 
> >> > > >>>>> with only 3 of the nodes to begin with (like, config 5, 
> >> > > >>>>> but only bring up 3 processes), does it speed up the 
> >> > > >>>>> leader election or is it still
> >> > > slow?
> >> > > >>>>>
> >> > > >>>>> C
> >> > > >>>>>
> >> > > >>>>> On Tue, Apr 28, 2015 at 1:41 PM, Karol Dudzinski <
> >> > > >>> karoldudzinski@gmail.com>
> >> > > >>>>> wrote:
> >> > > >>>>>
> >> > > >>>>>> Hi,
> >> > > >>>>>>
> >> > > >>>>>> We're seeing some rather strange leader election in one 
> >> > > >>>>>> of our
> >> > > >>> clusters.
> >> > > >>>>>> The duration reported by the "FOLLOWING - LEADER 
> >> > > >>>>>> ELECTION
> TOOK"
> >> > > >>>>>> log
> >> > > >>> line
> >> > > >>>>>> (and equivalent for the leader) seems to vary hugely.  
> >> > > >>>>>> During one
> >> > > >>> rolling
> >> > > >>>>>> reboot, I saw the number reported as small as 39ms and 
> >> > > >>>>>> as large as 57 seconds (difference in units is not a 
> >> > > >>>>>> typo).  The average is just
> >> > > >>> about 10
> >> > > >>>>>> seconds and std dev also about 10 seconds.  So the time 
> >> > > >>>>>> taken is not
> >> > > >>> only
> >> > > >>>>>> quite large, it's also very variable.
> >> > > >>>>>>
> >> > > >>>>>> We have other clusters but the average election time in 
> >> > > >>>>>> those is in the hundreds of millis with std dev in a 
> >> > > >>>>>> similar ballpark.  I guess one difference is the "slow" 
> >> > > >>>>>> cluster is 5 participants while the others
> >> > > >>> are 3,
> >> > > >>>>>> which may be a factor but I wouldn't expect it to make 
> >> > > >>>>>> two orders of magnitude difference!
> >> > > >>>>>>
> >> > > >>>>>> So my question is, what factors contribute to the 
> >> > > >>>>>> election time
> >> > > >>> reported
> >> > > >>>>>> by these log lines? And what can we do to speed this up?
> >> > > >>>>>>
> >> > > >>>>>> As far as I understand from logs and a quick browse 
> >> > > >>>>>> through the code
> >> > > >>> that
> >> > > >>>>>> time is the time to select a leader.  Syncing up to the 
> >> > > >>>>>> leader happens after that.  The syncing part I can 
> >> > > >>>>>> understand will vary depending on
> >> > > >>> load
> >> > > >>>>>> but I don't see why selecting the leader would.
> >> > > >>>>>>
> >> > > >>>>>> Thanks,
> >> > > >>>>>> Karol
> >> > > >>
> >> > > >>
> >> > > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>
Mime
View raw message