zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anmolnar <...@git.apache.org>
Subject [GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log
Date Mon, 28 May 2018 14:11:30 GMT
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/307#discussion_r191204646
  
    --- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java ---
    @@ -148,9 +174,46 @@ synchronized public void resetRequestCounters(){
             packetsReceived = 0;
             packetsSent = 0;
         }
    +    synchronized public void resetNumRequestsAboveThresholdTime() {
    +        numRequestsAboveThresholdTime = 0;
    +    }
         synchronized public void reset() {
             resetLatency();
             resetRequestCounters();
    +        resetNumRequestsAboveThresholdTime();
    +    }
    +
    +    public void checkLatency(final ZooKeeperServer zks, Request request) {
    +        long requestLatency = Time.currentElapsedTime() - request.createTime;
    +        boolean enabledAndAboveThreshold = (requestWarnThresholdMs == 0) ||
    +                (requestWarnThresholdMs > -1 && requestLatency > requestWarnThresholdMs);
    +        if (enabledAndAboveThreshold) {
    +            zks.serverStats().incNumRequestsAboveThresholdTime();
    +
    +            // Try acquiring lock only if not waiting
    +            boolean success = waitForLoggingWarnThresholdMsg.compareAndSet(Boolean.FALSE,
Boolean.TRUE);
    +            if (success) {
    +                LOG.warn("Request {} exceeded threshold. Took {} ms", request, requestLatency);
    +                startCount = zks.serverStats().getNumRequestsAboveThresholdTime();
    +                timer.schedule(new TimerTask() {
    +                    @Override
    +                    public void run() {
    +                        long count = zks.serverStats().getNumRequestsAboveThresholdTime()
- startCount;
    --- End diff --
    
    > it is fine to say that 0 requests had longer times since the last bad request was
logged. The total count can be seen using stat at any point of time. What do you suggest?
    
    Makes sense to me. What do you think of _not_ logging anything in the task if actual counter
equals to `startCount`?
    
    As I stated above, please use `ScheduledExecutorService` here.


---

Mime
View raw message