cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Burroughs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11752) histograms/metrics in 2.2 do not appear recency biased
Date Wed, 11 May 2016 22:48:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280940#comment-15280940
] 

Chris Burroughs commented on CASSANDRA-11752:
---------------------------------------------

So the [point|http://metrics.dropwizard.io/3.1.0/] of the metrics library is to "insight into
what your code does in production".   It is integrated into many projects.  Users expect to
be able to take those metrics and:
 * Draw a [line graph|http://www.datastax.com/dev/blog/pluggable-metrics-reporting-in-cassandra-2-0-2].
 * Alert on values so they know when there are problems with a cluster.
 * Use jconsole to inspect beans and determine what is happening Right Now.

I am aware that there are concerns both in implementation and assumptions (normal distribution)
with the metrics library.  They have been brought up both on [this bug tracker|https://issues.apache.org/jira/browse/CASSANDRA-6486]
and other forums. However imperfect, jconsole, line graphs, and threshold based alerts are
of critical practical use today.  All of these require *recent* data.  When my cluster is
failing to meet business needs I want to know as soon as possible.  

If I understand your proposal correctly, you are saying it would be better to drop all of
that, much more powerful (and mathematically sound!) if we did an out of band export and merge
of all of the histograms and create a heatmap.  This would provide better insight into the
distribution of values (by showing the full distribution instead of a handful of percentiles)
and allow for cluster wide aggregation.  This could be further augmented by using [hue and
saturaiton|https://docs.joyent.com/public-cloud/d-40-performance/cloud-analytics/use-of-color-in-cloud-analytics]
to call out latencies for individual nodes or column families.  I think that sounds fantastic,
but that is very much not where the industry is today.  Maybe Circonus can do that, but graphite
definitely can't.

And however cool that future sounds, the NEWS entry makes no mention of this as an intentional
fundamental change. Nor does CASSANDRA-5657 discuss the consequences. Indeed CASSANDRA-5657
hoped for improved accuracy and went out of the way to keep JMX functioning!

> histograms/metrics in 2.2 do not appear recency biased
> ------------------------------------------------------
>
>                 Key: CASSANDRA-11752
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11752
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Chris Burroughs
>              Labels: metrics
>         Attachments: boost-metrics.png, c-jconsole-comparison.png, c-metrics.png, default-histogram.png
>
>
> In addition to upgrading to metrics3, CASSANDRA-5657 switched to using  a custom histogram
implementation.  After upgrading to Cassandra 2.2 histograms/timer metrics are not suspiciously
flat.  To be useful for graphing and alerting metrics need to be biased towards recent events.
> I have attached images that I think illustrate this.
>  * The first two are a comparison between latency observed by a C* 2.2 (us) cluster shoring
very flat lines and a client (using metrics 2.2.0, ms) showing server performance problems.
 We can't rule out with total certainty that something else isn't the cause (that's why we
measure from both the client & server) but they very rarely disagree.
>  * The 3rd image compares jconsole viewing of metrics on a 2.2 and 2.1 cluster over several
minutes.  Not a single digit changed on the 2.2 cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message