zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jtuple <...@git.apache.org>
Subject [GitHub] zookeeper pull request #580: ZOOKEEPER-3098: Add additional server metrics
Date Fri, 20 Jul 2018 20:09:12 GMT
GitHub user jtuple opened a pull request:

    https://github.com/apache/zookeeper/pull/580

    ZOOKEEPER-3098: Add additional server metrics

    This patch adds several new server-side metrics as well as makes it easier to add new
metrics in the future. This patch also includes a handful of other minor metrics-related changes.
    
    Here's a high-level summary of the changes.
    
    1. This patch extends the request latency tracked in `ServerStats` to
       track `read` and `update` latency separately. Updates are any request
       that must be voted on and can change data, reads are all requests that
       can be handled locally and don't change data.
    
    2. This patch adds the `ServerMetrics` logic and the related `AvgMinMaxCounter`
       and `SimpleCounter` classes. This code is designed to make it incredibly easy to
       add new metrics. To add a new metric you just add one line to `ServerMetrics` and
       then directly reference that new metric anywhere in the code base. The `ServerMetrics`
       logic handles creating the metric, properly adding the metric to the JSON output of
       the `/monitor` admin command, and properly resetting the metric when necessary.
    
       The motivation behind `ServerMetrics` is to make things easy enough that it encourages
       new metrics to be added liberally. Lack of in-depth metrics/visibility is a long-standing
       ZooKeeper weakness. At Facebook, most of our internal changes build on `ServerMetrics`
and
       we have nearly 100 internal metrics at this time -- all of which we'll be upstreaming
       in the coming months as we publish more internal patches.
    
    3. This patch adds 20 new metrics, 14 which are handled by `ServerMetrics`.
    
    4. This patch replaces some uses of `synchronized` in `ServerStats` with atomic operations.
    
    Here's a list of new metrics added in this patch:
    
    - `uptime`: time that a peer has been in a stable leading/following/observing state
    - `leader_uptime`: uptime for peer in leading state
    - `global_sessions`: count of global sessions
    - `local_sessions`: count of local sessions
    - `quorum_size`: configured ensemble size
    - `synced_observers`: similar to existing `synced_followers` but for observers
    - `fsynctime`: time to fsync transaction log (avg/min/max)
    - `snapshottime`: time to write a snapshot (avg/min/max)
    - `dbinittime`: time to reload database -- read snapshot + apply transactions (avg/min/max)
    - `readlatency`: read request latency (avg/min/max)
    - `updatelatency`: update request latency (avg/min/max)
    - `propagation_latency`: end-to-end latency for updates, from proposal on leader to committed-to-datatree
on a given host (avg/min/max)
    - `follower_sync_time`: time for follower to sync with leader (avg/min/max)
    - `election_time`: time between entering and leaving election (avg/min/max)
    - `looking_count`: number of transitions into looking state
    - `diff_count`: number of diff syncs performed
    - `snap_count`: number of snap syncs performed
    - `commit_count`: number of commits performed on leader
    - `connection_request_count`: number of incoming client connection requests
    - `bytes_received_count`: similar to existing `packets_received` but tracks bytes

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jtuple/zookeeper ZOOKEEPER-3098

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/580.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #580
    
----
commit e6935f8d99eace05d29c2d6659e68e8b90b9a633
Author: Joseph Blomstedt <jdb@...>
Date:   2018-07-19T19:47:15Z

    ZOOKEEPER-3098: Add additional server metrics
    
    This patch adds several new server-side metrics as well as makes it easier
    to add new metrics in the future. This patch also includes a handful of
    other minor metrics-related changes.
    
    Here's a high-level summary of the changes.
    
    1. This patch extends the request latency tracked in `ServerStats` to
       track `read` and `update` latency separately. Updates are any request
       that must be voted on and can change data, reads are all requests that
       can be handled locally and don't change data.
    
    2. This patch adds the `ServerMetrics` logic and the related `AvgMinMaxCounter`
       and `SimpleCounter` classes. This code is designed to make it incredibly easy to
       add new metrics. To add a new metric you just add one line to `ServerMetrics` and
       then directly reference that new metric anywhere in the code base. The `ServerMetrics`
       logic handles creating the metric, properly adding the metric to the JSON output of
       the `/monitor` admin command, and properly resetting the metric when necessary.
    
       The motivation behind `ServerMetrics` is to make things easy enough that it encourages
       new metrics to be added liberally. Lack of in-depth metrics/visibility is a long-standing
       ZooKeeper weakness. At Facebook, most of our internal changes build on `ServerMetrics`
and
       we have nearly 100 internal metrics at this time -- all of which we'll be upstreaming
       in the coming months as we publish more internal patches.
    
    3. This patch adds 20 new metrics, 14 which are handled by `ServerMetrics`.
    
    4. This patch replaces some uses of `synchronized` in `ServerStats` with atomic operations.
    
    Here's a list of new metrics added in this patch:
    
    - `uptime`: time that a peer has been in a stable leading/following/observing state
    - `leader_uptime`: uptime for peer in leading state
    - `global_sessions`: count of global sessions
    - `local_sessions`: count of local sessions
    - `quorum_size`: configured ensemble size
    - `synced_observers`: similar to existing `synced_followers` but for observers
    - `fsynctime`: time to fsync transaction log (avg/min/max)
    - `snapshottime`: time to write a snapshot (avg/min/max)
    - `dbinittime`: time to reload database -- read snapshot + apply transactions (avg/min/max)
    - `readlatency`: read request latency (avg/min/max)
    - `updatelatency`: update request latency (avg/min/max)
    - `propagation_latency`: end-to-end latency for updates, from proposal on leader to committed-to-datatree
on a given host (avg/min/max)
    - `follower_sync_time`: time for follower to sync with leader (avg/min/max)
    - `election_time`: time between entering and leaving election (avg/min/max)
    - `looking_count`: number of transitions into looking state
    - `diff_count`: number of diff syncs performed
    - `snap_count`: number of snap syncs performed
    - `commit_count`: number of commits performed on leader
    - `connection_request_count`: number of incoming client connection requests
    - `bytes_received_count`: similar to existing `packets_received` but tracks bytes

----


---

Mime
View raw message