zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jtuple <...@git.apache.org>
Subject [GitHub] zookeeper pull request #580: ZOOKEEPER-3098: Add additional server metrics
Date Sat, 08 Sep 2018 00:02:43 GMT
Github user jtuple commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/580#discussion_r216112301
  
    --- Diff: src/java/main/org/apache/zookeeper/server/ServerMetrics.java ---
    @@ -0,0 +1,103 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.zookeeper.server;
    +
    +import org.apache.zookeeper.server.metric.AvgMinMaxCounter;
    +import org.apache.zookeeper.server.metric.Metric;
    +import org.apache.zookeeper.server.metric.SimpleCounter;
    +
    +import java.util.LinkedHashMap;
    +import java.util.Map;
    +
    +public enum ServerMetrics {
    +    /**
    +     * Txnlog fsync time
    +     */
    +    FSYNC_TIME(new AvgMinMaxCounter("fsynctime")),
    +
    +    /**
    +     * Snapshot writing time
    +     */
    +    SNAPSHOT_TIME(new AvgMinMaxCounter("snapshottime")),
    +
    +    /**
    +     * Db init time (snapshot loading + txnlog replay)
    +     */
    +    DB_INIT_TIME(new AvgMinMaxCounter("dbinittime")),
    +
    +    /**
    +     * Stats for read request. The timing start from when the server see the
    +     * request until it leave final request processor.
    +     */
    +    READ_LATENCY(new AvgMinMaxCounter("readlatency")),
    +
    +    /**
    +     * Stats for request that need quorum voting. Timing is the same as read
    +     * request. We only keep track of stats for request that originated from
    +     * this machine only.
    +     */
    +    UPDATE_LATENCY(new AvgMinMaxCounter("updatelatency")),
    +
    +    /**
    +     * Stats for all quorum request. The timing start from when the leader
    +     * see the request until it reach the learner.
    +     */
    +    PROPAGATION_LATENCY(new AvgMinMaxCounter("propagation_latency")),
    +
    +    FOLLOWER_SYNC_TIME(new AvgMinMaxCounter("follower_sync_time")),
    +    ELECTION_TIME(new AvgMinMaxCounter("election_time")),
    --- End diff --
    
    If by gauge you're referring to a metric that only reports a single value, I don't think
that's a good fit for these metrics. If you're querying metrics from Zookeeper every minute
or so, you'd miss all but the last election that occurred in that interval. With an avg/min/max
counter, we can query periodically and have statistics that summarize all elections that occurred
in that interval. Same logic applies to fsync, init, etc.
    
    Tracking individual event times is probably better left for the "push metrics to external
system" approach that's orthogonal to this pull request.
    
    However, if we wanted to provide statistics + bounded history internally, we could consider
adding an extended version of say `AvgMinMaxCounter` that also keeps a fixed number of values
around that could be queried, dropping the oldest or doing some sort of reservoir sampling.
    
    Definitely something we can do in the future. If we do, it's unclear if we'd want to wire
that up to `/metrics` admin command though vs exposing through a different means.


---

Mime
View raw message