zookeeper-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [zookeeper] insumity opened a new pull request #1031: ZOOKEEPER-3479: Logging false leader election times
Date Wed, 31 Jul 2019 01:48:16 GMT
insumity opened a new pull request #1031: ZOOKEEPER-3479: Logging false leader election times
URL: https://github.com/apache/zookeeper/pull/1031
 
 
   See https://issues.apache.org/jira/browse/ZOOKEEPER-3479 here for the issue.
   
   The problem is that the logged leader election times are much smaller than in reality.
For example, we have log files that contain the following lines:
   ```
   2019-07-31 00:57:31,852 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):QuorumPeer@1318]
- PeerState set to LOOKING
   2019-07-31 00:57:31,853 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):QuorumPeer@1193]
- LOOKING
   2019-07-31 00:57:31,853 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):FastLeaderElection@885]
- New election. My id = 1, proposed zxid=0x100000001
   [...]
   2019-07-31 00:57:32,272 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):Follower@69]
- FOLLOWING - LEADER ELECTION TOOK - 1 MS
   ```
   Leader election took more than 400ms, not 1ms!
   
   
   The reason this logging bug exists has to do with this [line](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1402)
in `QuorumPeer`. The computation of the leader election time starts inside [`FastLeaderElection.lookForLeader`
](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L915)
where `start_fle` is set. After a server has found a leader, it repeats the `QuorumPeer` [loop](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1288)
and resets the `start_fle` value. Hence, when, for example, a follower [logs](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Follower.java#L75)
how much time it took for the leader election, it logs a much smaller time than it actually
took for the leader election to complete. To be precise, it just logs the time it took to
loop inside [`QuorumPeer`](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1288).
   
   Furthermore, if we remove the problematic [line](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1402),
then there is no reason to check whether `start_fle == 0` in [`FastLeaderElection`](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L914),
since `start_fle` is always `0` at that point. I'm not sure if this `if` statement is needed
in the first place.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message