zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3399) Remove logging in getGlobalOutstandingLimit for optimal performance.
Date Sat, 25 May 2019 20:28:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848277#comment-16848277

Hudson commented on ZOOKEEPER-3399:

SUCCESS: Integrated in Jenkins build Zookeeper-trunk-single-thread #373 (See [https://builds.apache.org/job/Zookeeper-trunk-single-thread/373/])
ZOOKEEPER-3399: Remove logging in getGlobalOutstandingLimit for optimal (eolivelli: rev 968f5f365e53d0bcbbe0225cc382327badbd8380)
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/LeaderZooKeeperServer.java
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FollowerZooKeeperServer.java

> Remove logging in getGlobalOutstandingLimit for optimal performance.
> --------------------------------------------------------------------
>                 Key: ZOOKEEPER-3399
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3399
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.6.0
>            Reporter: Michael Han
>            Assignee: Michael Han
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.6.0
>          Time Spent: 1h
>  Remaining Estimate: 0h
> Recently we have moved some of our production clusters to the top of the trunk. One issue
we found is a performance regression on read and write latency on the clusters where the quorum
is also serving traffic. The average read latency increased by 50x, p99 read latency increased
by 300x. 
> The root cause is a log statement introduced in ZOOKEEPER-3177 (PR711), where we added
a LOG.info statement in getGlobalOutstandingLimit. getGlobalOutstandingLimit is on the critical
code path for request processing and for each request, it will be called twice (one at processing
the packet, one at finalizing the request response). This not only degrades performance of
the server, but also bloated the log file, when the QPS of a server is high.
> This only impacts clusters when the quorum (leader + follower) is serving traffic. For
clusters where only observers are serving traffic no impact is observed.

This message was sent by Atlassian JIRA

View raw message