zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enrico Olivelli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3098) Add additional server metrics
Date Fri, 20 Jul 2018 20:40:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551244#comment-16551244

Enrico Olivelli commented on ZOOKEEPER-3098:

This sounds very interesting as we are discussing about the introduction of new metrics system.

We could start from this work and maybe abstract the way metrics are published, in order to
support various metrics providers like Prometheus

> Add additional server metrics
> -----------------------------
>                 Key: ZOOKEEPER-3098
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3098
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.6.0
>            Reporter: Joseph Blomstedt
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
> This patch adds several new server-side metrics as well as makes it easier to add new
metrics in the future. This patch also includes a handful of other minor metrics-related changes.
> Here's a high-level summary of the changes.
>  # This patch extends the request latency tracked in {{ServerStats}} to track {{read}}
and {{update}} latency separately. Updates are any request that must be voted on and can change
data, reads are all requests that can be handled locally and don't change data.
>  # This patch adds the {{ServerMetrics}} logic and the related {{AvgMinMaxCounter}} and
{{SimpleCounter}} classes. This code is designed to make it incredibly easy to add new metrics.
To add a new metric you just add one line to {{ServerMetrics}} and then directly reference
that new metric anywhere in the code base. The {{ServerMetrics}} logic handles creating the
metric, properly adding the metric to the JSON output of the {{/monitor}} admin command, and
properly resetting the metric when necessary. The motivation behind {{ServerMetrics}} is to
make things easy enough that it encourages new metrics to be added liberally. Lack of in-depth
metrics/visibility is a long-standing ZooKeeper weakness. At Facebook, most of our internal
changes build on {{ServerMetrics}} and we have nearly 100 internal metrics at this time –
all of which we'll be upstreaming in the coming months as we publish more internal patches.
>  # This patch adds 20 new metrics, 14 which are handled by {{ServerMetrics}}.
>  # This patch replaces some uses of {{synchronized}} in {{ServerStats}} with atomic operations.
> Here's a list of new metrics added in this patch:
>  - {{uptime}}: time that a peer has been in a stable leading/following/observing state
>  - {{leader_uptime}}: uptime for peer in leading state
>  - {{global_sessions}}: count of global sessions
>  - {{local_sessions}}: count of local sessions
>  - {{quorum_size}}: configured ensemble size
>  - {{synced_observers}}: similar to existing `synced_followers` but for observers
>  - {{fsynctime}}: time to fsync transaction log (avg/min/max)
>  - {{snapshottime}}: time to write a snapshot (avg/min/max)
>  - {{dbinittime}}: time to reload database – read snapshot + apply transactions (avg/min/max)
>  - {{readlatency}}: read request latency (avg/min/max)
>  - {{updatelatency}}: update request latency (avg/min/max)
>  - {{propagation_latency}}: end-to-end latency for updates, from proposal on leader to
committed-to-datatree on a given host (avg/min/max)
>  - {{follower_sync_time}}: time for follower to sync with leader (avg/min/max)
>  - {{election_time}}: time between entering and leaving election (avg/min/max)
>  - {{looking_count}}: number of transitions into looking state
>  - {{diff_count}}: number of diff syncs performed
>  - {{snap_count}}: number of snap syncs performed
>  - {{commit_count}}: number of commits performed on leader
>  - {{connection_request_count}}: number of incoming client connection requests
>  - {{bytes_received_count}}: similar to existing `packets_received` but tracks bytes

This message was sent by Atlassian JIRA

View raw message