zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew January <ajanu...@evertz.com>
Subject Issue electing leader, suspect delay in processing notifications
Date Fri, 14 Sep 2018 15:57:35 GMT
I have a 3 node cluster on 3.4.12. One of the nodes (node 2) was taken down for maintenance.
The remaining two nodes seem to have issues gaining quorum.

I think the issues is that node 1 is taking 5 seconds to process notifications.

>From the logs, node 1 and node 3 both agree that node 3 should be leader.
Node 1 sends out it's notification that it's following node 3, then immediately tries to connect
to it to follow.
Because node 3 takes 5 seconds to pick up the notification that node 1 accepts it as leader,
it isn't accepting connections to be followed.
This means node 1 times out trying to connect to node 3 before node 3 becomes leader.
Node 1 starts a new election. Node 3 times out waiting for acknowledgement that it's the leader.
The process repeats.

My evidence that node 1 is taking 5 seconds to process notifications is:
1) Node 3 starts a new election, which broadcasts a notification with a new round to itself
and node 1
2) Node 1 gets the message almost immediately
3) Node 3 doesn't get its own message until 5 seconds later


node3: 2018-09-14 09:28:50,534 - INFO  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813]
- New election. My id =  3, proposed zxid=0x600001f52
node1: 2018-09-14 09:28:50,535 - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@595] - Notification:
1 (message format version), 3 (n.leader), 0x600001f52 (n.zxid), 0x4e0 (n.round), LOOKING (n.state),
3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)
node3: 2018-09-14 09:28:55,539 - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification:
1 (message format version), 3 (n.leader), 0x600001f52 (n.zxid), 0x4e0 (n.round), LOOKING (n.state),
3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)

node3: 2018-09-14 09:29:25,544 - INFO  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813]
- New election. My id =  3, proposed zxid=0x600001f52
node1: 2018-09-14 09:29:25,546 - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@595] - Notification:
1 (message format version), 3 (n.leader), 0x600001f52 (n.zxid), 0x4e1 (n.round), LOOKING (n.state),
3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)
node3: 2018-09-14 09:29:30,547 - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification:
1 (message format version), 3 (n.leader), 0x600001f52 (n.zxid), 0x4e1 (n.round), LOOKING (n.state),
3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)

node3: 2018-09-14 09:30:00,772 - INFO  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813]
- New election. My id =  3, proposed zxid=0x600001f52
node1: 2018-09-14 09:30:00,774 - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@595] - Notification:
1 (message format version), 3 (n.leader), 0x600001f52 (n.zxid), 0x4e2 (n.round), LOOKING (n.state),
3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)
node3: 2018-09-14 09:30:05,777 - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification:
1 (message format version), 3 (n.leader), 0x600001f52 (n.zxid), 0x4e2 (n.round), LOOKING (n.state),
3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)

Similarly, when node 1 tries to start an election:
1) Node 1 starts a new election, which has a new round
2) Node 1 gets the message almost immediately
3) Node 3 doesn't get the message until 5 seconds later

node1: 2018-09-14 09:31:24,853 - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813]
- New election. My id =  1, proposed zxid=0x600001ee4
node1: 2018-09-14 09:31:25,653 - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@595] - Notification:
1 (message format version), 1 (n.leader), 0x600001ee4 (n.zxid), 0x4e5 (n.round), LOOKING (n.state),
1 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)
node3: 2018-09-14 09:31:30,654 - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification:
1 (message format version), 1 (n.leader), 0x600001ee4 (n.zxid), 0x4e5 (n.round), LOOKING (n.state),
1 (n.sid), 0x6 (n.peerEpoch) LEADING (my state)

node1: 2018-09-14 09:32:00,072 - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813]
- New election. My id =  1, proposed zxid=0x600001ee4
node1: 2018-09-14 09:32:00,680 - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@595] - Notification:
1 (message format version), 1 (n.leader), 0x600001ee4 (n.zxid), 0x4e6 (n.round), LOOKING (n.state),
1 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)
node3: 2018-09-14 09:32:05,685 - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification:
1 (message format version), 1 (n.leader), 0x600001ee4 (n.zxid), 0x4e6 (n.round), LOOKING (n.state),
1 (n.sid), 0x6 (n.peerEpoch) LEADING (my state)

I've looked through the code but I can't see any reason it would have a consistent 5 second
delay when processing notifications. Any ideas?
This e-mail and any files transmitted with it are confidential and intended solely for the
use of the individual or entity to whom they are addressed. If you have received this e-mail
in error please notify the sender (as shown above). Kindly do not reproduce, print or forward
any material received in error, please delete it immediately. Evertz UK Limited (Company No.
3458137) is incorporated in England and Wales and has its registered office at 100 Berkshire
Place, Wharfedale Road, Winnersh, Wokingham, Berkshire, United Kingdom, RG41 5RD. Evertz Singapore
Pte Limited (Company No. 200817005N) is incorporated in Singapore and has its registered office
at One Marina Boulevard, #28-00. Singapore 018989.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message