zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Norbert Kalmar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ZOOKEEPER-3037) Add JvmPauseMonitor to ZooKeeper
Date Wed, 09 May 2018 14:58:00 GMT
Norbert Kalmar created ZOOKEEPER-3037:

             Summary: Add JvmPauseMonitor to ZooKeeper
                 Key: ZOOKEEPER-3037
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3037
             Project: ZooKeeper
          Issue Type: Improvement
          Components: contrib
    Affects Versions: 3.4.12, 3.5.3
            Reporter: Norbert Kalmar

After a ZK crash, or client timeout sometimes it's hard to determine from the logs what happened.
Knowing if ZK was responsive at the time would help a lot. For example, ZK might spend a lot
of time waiting on GC (there is still some misconception that ZK is a storage). 

To help detect this, HADOOP already has a great tool called JVM Pause Monitor. (As the name
suggest, it can be also used for monitoring, but it also helps post-mortem in a lot of cases).
Basically it has a daemon that sleeps for one second, and if the sleep time exceeds the 1s
by more than the threshold (1s: INFO, 10s: WARN by default - this can be configurable in our
case, see below), it will alert/make a log entry. It can also monitor the time GC took.

The class implementing this is in HADOOP-common, but ZK should not depend on this package.
Since this is a straightforward implementation, and in the past five years the few commits
it had is nothing really serious, I think we could just copy this class in ZooKeeper, and
introduce it as a configurable feature, by default it can be off.

The class:

- Create a class in ZK under contrib called JvmPauseMonitor. 
- Make feature configurable, by default: OFF
- ?Make sleep time and threshold time configurable?

This message was sent by Atlassian JIRA

View raw message