zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jtuple <...@git.apache.org>
Subject [GitHub] zookeeper issue #580: ZOOKEEPER-3098: Add additional server metrics
Date Sat, 08 Sep 2018 00:51:06 GMT
Github user jtuple commented on the issue:

    https://github.com/apache/zookeeper/pull/580
  
    Sorry for the delay, August was a super busy month.
    
    I've rebased onto latest master + added basic unit tests for `AvgMinMaxCounter` and `SimpleCounter`.
I also resolved the findbugs issue and the `testMonitor` unit tests. Jenkin still seems unhappy,
but I'm not sure what's up -- the linked test results show no failures.
    
    I've also addressed the majority of inline comments. Please let me know if there's anything
else I overlooked.
    
    ---
    
    As mentioned in a prior comment, this pull request is just the first of many. In this
PR we add the `AvgMinMaxCounter` and `SimpleCounter` metrics.
    
    We have additional metrics internally that we'll upstream in the future, including `AvgMinMaxPercentileCounter`
as well as set variants: `AvgMinMaxCounterSet` and `AvgMinMaxPercentileCounterSet`. I talked
about both of these in [a previous comment](https://github.com/apache/zookeeper/pull/580#issuecomment-410340039)
    
    Those are all the types we currently have internally. Other types like a Gauge could easily
be done in a future pull-request, although our experience has been that gauge's are less useful
in a polling based approach since you only see the most recent value. Something like an `AvgMinMaxCounter`
metric gives you more information when there are multiple events, and trivially reduces to
a gauge-equivalent when there's only a single value during the sampling interval.
    
    For things like commit processor queue sizes and the like, we simply query the queue size
every time we enqueue an item and report that instantaneous size as a value in an `AvgMinMaxCounter`.
When we periodically query metrics, we then know the min/max/avg queue size during the interval,
and as well as the population count which is equivalent to "number of enqueues" during the
interval.
    
    ---
    
    @hanm: For your example, you have a few options:
    
    1. Do as you mentioned and add a new metric to the `ServerMetric` enum for each request
type.
    
    2. Once we land the set-variants, you could add a single metric to the `ServerMetric`
enum and rely upon the set logic. Eg. add a `REQUEST_LATENCY(new AvgMinMaxCounterSet("request_latency"))`
metric and then do `ServerMetrics.REQUEST_LATENCY.add("create_session", latency)`, `ServerMetrics.REQUEST_LATENCY.add("set_data",
latency)`, etc. For your example, you wanted to track counts so you'd probably want a `SimpleCounterSet`
which we don't have, but would be easy to create.
    
    3. Use something custom. For example, the majority of our internal metrics build on `ServerMetrics`,
but we do have a handful that don't. One example that we'll be upstreaming soon is our so-called
"request state" metrics that track the count for requests at different stages in the request
pipeline. We maintain a count of requests queued, issued, and completed for different category
of requests (all, read, write, create_session, close_session, sync, auth, set_watches, other).
This matrix is also added to the output of `/metrics` but doesn't use any of the `ServerMetrics`
logic.


---

Mime
View raw message