zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norbert Kalmar <nkal...@cloudera.com>
Subject [SUGGESTION] JvmPauseMonitor in ZooKeeper
Date Wed, 09 May 2018 14:14:55 GMT

I just got a tip that we could improve on the logging in ZooKeeper. After a
ZK crash, or client timeout sometimes it's hard to determine from the logs
what happened. Knowing if ZK was responsive at the time would help a lot.
For example, ZK might spend a lot of time waiting on GC (there is still
some misconception that ZK is a storage).

To help detect this, HADOOP already has a great tool called JVM Pause
Monitor. (As the name suggest, it can be also used for monitoring, but it
also helps post-mortem in a lot of cases). Basically it has a daemon that
sleeps for one second, and if the sleep time exceeds the 1s by more than
the threshold (1s: INFO, 10s: WARN by default - this can be configurable in
our case, see below), it will alert/make a log entry. It can also monitor
the time GC took.

Now, this class is in the HADOOP-common. I wouldn't want to depend on
Hadoop-common because of this one feature/class (it is actually a single
class). Since this is a straightforward implementation, and in the past
five years the few commits it had is nothing really serious, I think we
could just copy this class in ZooKeeper, and introduce it as a configurable
feature, by default it can be off.

The class:

What do You think?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message