zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Blomstedt (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ZOOKEEPER-3098) Add additional server metrics
Date Fri, 20 Jul 2018 19:56:00 GMT
Joseph Blomstedt created ZOOKEEPER-3098:

             Summary: Add additional server metrics
                 Key: ZOOKEEPER-3098
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3098
             Project: ZooKeeper
          Issue Type: Improvement
          Components: server
    Affects Versions: 3.6.0
            Reporter: Joseph Blomstedt

This patch adds several new server-side metrics as well as makes it easier to add new metrics
in the future. This patch also includes a handful of other minor metrics-related changes.

Here's a high-level summary of the changes.
 # This patch extends the request latency tracked in {{ServerStats}} to track {{read}} and
{{update}} latency separately. Updates are any request that must be voted on and can change
data, reads are all requests that can be handled locally and don't change data.

 # This patch adds the {{ServerMetrics}} logic and the related {{AvgMinMaxCounter}} and {{SimpleCounter}}
classes. This code is designed to make it incredibly easy to add new metrics. To add a new
metric you just add one line to {{ServerMetrics}} and then directly reference that new metric
anywhere in the code base. The {{ServerMetrics}} logic handles creating the metric, properly
adding the metric to the JSON output of the {{/monitor}} admin command, and properly resetting
the metric when necessary.

The motivation behind {{ServerMetrics}} is to make things easy enough that it encourages new
metrics to be added liberally. Lack of in-depth metrics/visibility is a long-standing ZooKeeper
weakness. At Facebook, most of our internal changes build on {{ServerMetrics}} and we have
nearly 100 internal metrics at this time – all of which we'll be upstreaming in the coming
months as we publish more internal patches.

 # This patch adds 20 new metrics, 14 which are handled by {{ServerMetrics}}.

 # This patch replaces some uses of {{synchronized}} in {{ServerStats}} with atomic operations.

Here's a list of new metrics added in this patch:
 - {{uptime}}: time that a peer has been in a stable leading/following/observing state
 - {{leader_uptime}}: uptime for peer in leading state
 - {{global_sessions}}: count of global sessions
 - {{local_sessions}}: count of local sessions
 - {{quorum_size}}: configured ensemble size
 - {{synced_observers}}: similar to existing `synced_followers` but for observers
 - {{fsynctime}}: time to fsync transaction log (avg/min/max)
 - {{snapshottime}}: time to write a snapshot (avg/min/max)
 - {{dbinittime}}: time to reload database – read snapshot + apply transactions (avg/min/max)
 - {{readlatency}}: read request latency (avg/min/max)
 - {{updatelatency}}: update request latency (avg/min/max)
 - {{propagation_latency}}: end-to-end latency for updates, from proposal on leader to committed-to-datatree
on a given host (avg/min/max)
 - {{follower_sync_time}}: time for follower to sync with leader (avg/min/max)
 - {{election_time}}: time between entering and leaving election (avg/min/max)
 - {{looking_count}}: number of transitions into looking state
 - {{diff_count}}: number of diff syncs performed
 - {{snap_count}}: number of snap syncs performed
 - {{commit_count}}: number of commits performed on leader
 - {{connection_request_count}}: number of incoming client connection requests
 - {{bytes_received_count}}: similar to existing `packets_received` but tracks bytes

This message was sent by Atlassian JIRA

View raw message