zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3037) Add JvmPauseMonitor to ZooKeeper
Date Thu, 18 Apr 2019 20:19:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821463#comment-16821463

Hudson commented on ZOOKEEPER-3037:

FAILURE: Integrated in Jenkins build ZooKeeper-trunk #484 (See [https://builds.apache.org/job/ZooKeeper-trunk/484/])
ZOOKEEPER-3037: Add JVMPauseMonitor (andor: rev e9adf6ee09ef18258653d65c851fa84c3cd1a51d)
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerConfig.java
* (edit) zookeeper-server/src/test/java/org/apache/zookeeper/ServerConfigTest.java
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServerMain.java
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeerMain.java
* (add) zookeeper-server/src/main/java/org/apache/zookeeper/server/util/JvmPauseMonitor.java
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
* (edit) zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/QuorumPeerConfigTest.java
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java
* (add) zookeeper-server/src/test/java/org/apache/zookeeper/server/util/JvmPauseMonitorTest.java

> Add JvmPauseMonitor to ZooKeeper
> --------------------------------
>                 Key: ZOOKEEPER-3037
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3037
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: contrib
>    Affects Versions: 3.5.3, 3.4.12
>            Reporter: Norbert Kalmar
>            Assignee: Norbert Kalmar
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.6.0
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
> After a ZK crash, or client timeout sometimes it's hard to determine from the logs what
happened. Knowing if ZK was responsive at the time would help a lot. For example, ZK might
spend a lot of time waiting on GC (there is still some misconception that ZK is a storage).

> To help detect this, HADOOP already has a great tool called JVM Pause Monitor. (As the
name suggest, it can be also used for monitoring, but it also helps post-mortem in a lot of
cases). Basically it has a daemon that sleeps for one second, and if the sleep time exceeds
the 1s by more than the threshold (1s: INFO, 10s: WARN by default - this can be configurable
in our case, see below), it will alert/make a log entry. It can also monitor the time GC took.
> The class implementing this is in HADOOP-common, but ZK should not depend on this package.
Since this is a straightforward implementation, and in the past five years the few commits
it had is nothing really serious, I think we could just copy this class in ZooKeeper, and
introduce it as a configurable feature, by default it can be off.
> The class:
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java
> Task:
> - Create a class in ZK (under zookeeper/server/util/) called JvmPauseMonitor. 
> - Make feature configurable, by default: OFF
> - Make sleep time and threshold time configurable
> - Update documentation
> - Add [current size of the heap OR % of heap used] in the log entry whenever sleep threshold
had exceeded by a lot (10s)

This message was sent by Atlassian JIRA

View raw message