zookeeper-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fangmin Lv (Jira)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3534) Non-stop communication between participants and observers.
Date Wed, 04 Sep 2019 20:08:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922824#comment-16922824
] 

Fangmin Lv commented on ZOOKEEPER-3534:
---------------------------------------

[~karolos] thanks for reporting the issue, if there are lots of observers, the leader election
notification could overwhelm and affect the quorum leader election.

There is a setting called zookeeper.fastleader.minNotificationInterval, which limits the minimal
time it will wait before sending another notification, can you check if raise the minimal
value here helps or not?

> Non-stop communication between participants and observers.
> ----------------------------------------------------------
>
>                 Key: ZOOKEEPER-3534
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3534
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Karolos Antoniadis
>            Priority: Minor
>         Attachments: create_np_case_3.sh
>
>
> Hello ZooKeeper developers,
> there are cases during *leader election*, where there is non-stop communication between
observers and participants. 
> This communication occurs as follows: 
> - an observer sends a notification to a participant
> - the participant responds
> - an observer sends another notification and so on and so forth ...
> It is possible that an observer-participant pair exchange hundreds of notification messages
in the span of one second. As a consequence, the system is burdened with unnecessary load,
and the logs are filled with useless information as can be seen below:
>  
> {noformat}
> 2019-09-03 16:37:22,630 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection@692]
- Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x1,
n.zxid:0x0, message format version:0x2, n.config version:0x100000000
> 2019-09-03 16:37:22,632 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection@692]
- Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x1,
n.zxid:0x0, message format version:0x2, n.config version:0x100000000
> 2019-09-03 16:37:22,633 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection@692]
- Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x1,
n.zxid:0x0, message format version:0x2, n.config version:0x100000000
> 2019-09-03 16:37:22,635 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection@692]
- Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x1,
n.zxid:0x0, message format version:0x2, n.config version:0x100000000
> 2019-09-03 16:37:22,635 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection@692]
- Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x1,
n.zxid:0x0, message format version:0x2, n.config version:0x100000000{noformat}
>  
>  
> h4. Why does the non-stop communication bug occur?
> This bug stems from the fact that when a participant receives a notification from an
observer, the participant responds right away, as can be seen [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L325] -
it is even written in the comments. Now, when the observer receives back the message from
the participant there are 2 cases that could lead to non-stop communication:
> 1) The observer has a greater {{logicalclock}} than the participant and both the observer
and the participant are in a {{LOOKING}} state. In such a case, the observer responds right
away to the participant as can be seen [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L392].

> 2) The observer is {{OBSERVING}} while the participant is still {{LOOKING}}, then the
non-stop communication ensues due to the code in [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L413].
 
> h4.  
> h4. How can we reproduce this non-stop communication bug?
> It is not trivial to reproduce this bug, although we saw it occurring in the wild. To
reproduce this bug, we provide a script that utilizes docker and that can be used to easily
debug ZK code. The script starts a ZK cluster with 3 participants (P1, P2, P3) and 2 observers
(O1, O2). The script together with instructions on how to use it can be found [here|https://github.com/insumity/zookeeper_debug_tool].
>  
> Using the script, there are at least 2 ways to reproduce the bug:
> 1) We can artificially delay the leader election by introducing the following code in
{{FastLeaderElection}} (in [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L1006]).
>  
> {code:java}
> // Verify if there is any change in the proposed leader
> int time = finalizeWait;
> if (self.getId() >= 1 && self.getId() <= 3) {
>     time = 2000;
> }{code}
>  
> and changing the immediate succeeding line:
> {code:java}
> while ((n = recvqueue.poll(finalizeWait, TimeUnit.MILLISECONDS)) != null) {code}
> to 
>  
> {code:java}
> while ((n = recvqueue.poll(time, TimeUnit.MILLISECONDS)) != null) { 
> {code}
> Now, if we run a ZK cluster and force a leader election by killing the leader, we see
the non-stop communication occurring. The reason is that  as a result of this delay the observer
restarts (increments its {{logicalclock}}), tries to connect to the previous leader, but fails
since the previous leader is crashed, and the observer restarts by incrementing {{logicalclock}} once
more and hence starting the non-stop communication.
> 2) Another way to reproduce the bug is by creating a network partition that partitions
P1 from P2, P3, O2 but that still keeps participant P1 connected to observer O1. In such a
case, the non-stop communication ensues since O1 is {{OBSERVING}} while P1 remains in a {{LOOKING}}
state. To reproduce this bug, using the above script, someone just has to do:
>  *  wait till the ZK cluster starts running
>  *  in your local machine do ./create_np_case_3.sh (attached file in this issue)
>  *  force a leader election by restarting the leader (most likely the leader is server
3)
> It is true that scenario 2 is slightly unrealistic. However, the first scenario where
leader election takes too much time to complete is pretty realistic.  Whenever we saw this
non-stop communication bug, it was because leader election took too long to complete. For
instance, it could occur if there is some type of split-vote during LE and the elected leader
times out while
> {noformat}
> waiting for epoch from quorum {noformat}
> [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L1350].
>  
> h4. 
> How can we fix this issue?
> One idea would be that before an observer starts observing a leader, it verifies that
the leader is up and running using a check similar to {{checkLeader}} as is done [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L1037].
> This will prevent from having non-stop communication between observers and participants
during long leader elections, since observers do not try to connect to an already failed leader,
and hence they will not increase their {{logicalclock}}. However, this fix on its own does
not solve the 2nd way to reproduce the bug that was described above.
> Best Regards,
> Karolos
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Mime
View raw message