zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Francis <sfran...@logicmonitor.com>
Subject ZooKeeper JMX Monitoring - suggestion
Date Fri, 12 Aug 2011 18:14:49 GMT

I just wrote ZooKeeper monitoring for the SaaS monitoring company I
work for, at the request of one of our customers (sales-y brief view
of the ZooKeeper monitoring at
(Feel free to contact me directly if anyone is interested in anything

I have a few suggestions as to how the exposed JMX objects could be
- instead of reporting average and max latency, which, so far as I can
tell from the source code, seems to be since server start (or the
Mbean to reset the stats is triggered), do the same as Tomcat, and
other projects: i.e. report the total processing time as one counter,
and also report the total number of requests processed.
Then if you want to calculate the average latency since server start,
it's easy, but more interesting its also easy to calculate the average
latency for any time period (such as the last minute - sample total
requests and latency at start and end of minute, subtract, divide, and
there you go.) This lets you graph and alert on latencies in a
meaningful way.

- Having the Mbean name change as to whether the server is Leader or
Follower is odd. First time I've seen that in any JMX app (we do a lot
more than we list on our website.)  That took a bit of thought as to
how to get consistent graphs regardless of the role the server is in.
 That probably presents a block to many other monitoring systems, so
may want to be changed at some point.

- Exposing things like "synced" as an operation, rather than an
attribute, also seems odd. It would be nice if that was a simple

And finally - any chance someone can explain the
"pendingRevalidationCount"? I couldn't figure that one out enough to
understand it's significance.

View raw message