cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11117) ColUpdateTimeDeltaHistogram histogram overflow
Date Mon, 03 Oct 2016 14:35:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15542557#comment-15542557
] 

Joel Knighton commented on CASSANDRA-11117:
-------------------------------------------

||branch||testall||dtest||
|[CASSANDRA-11117-2.2|https://github.com/jkni/cassandra/tree/CASSANDRA-11117-2.2]|[testall|http://cassci.datastax.com/view/Dev/view/jkni/job/jkni-CASSANDRA-11117-2.2-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/jkni/job/jkni-CASSANDRA-11117-2.2-dtest]|
|[CASSANDRA-11117-3.0|https://github.com/jkni/cassandra/tree/CASSANDRA-11117-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/jkni/job/jkni-CASSANDRA-11117-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/jkni/job/jkni-CASSANDRA-11117-3.0-dtest]|
|[CASSANDRA-11117-3.X|https://github.com/jkni/cassandra/tree/CASSANDRA-11117-3.X]|[testall|http://cassci.datastax.com/view/Dev/view/jkni/job/jkni-CASSANDRA-11117-3.X-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/jkni/job/jkni-CASSANDRA-11117-3.X-dtest]|
|[CASSANDRA-11117-trunk|https://github.com/jkni/cassandra/tree/CASSANDRA-11117-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/jkni/job/jkni-CASSANDRA-11117-trunk-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/jkni/job/jkni-CASSANDRA-11117-trunk-dtest]|

I've linked patches above. CI for the patches look clean relative to upstream.

For 2.2, I found write paths in Thrift for accessing counters that reproduced this behavior.
In 3.0+, I found instances where UPDATE -> INSERT patterns reproduced this behavior, when
an update to a nonexistent row created an empty {{LivenessInfo}}. This behavior is also possible
to reproduce just by using client-specified timestamps. 

This leads us to a few options to fix this issue; we could look through the read/write path
and try special case handling of the first two issues (and any others we would have to discover),
but that wouldn't solve the client-specified timestamp behavior. For that reason, I opted
in to simply filtering to values that wouldn't overflow the histogram. This limits the ColUpdateTimeDelta
histogram to reflecting updates under normal conditions that are under about 100 days. The
risk of the first approach is introducing additional complexity in the read/write path for
a fairly niche metric. The risk of the second approach is that it reduces the amount of the
information in the ColUpdateTimeDelta histogram. I do not think this significantly reduces
the utility of the metric as proposed in [CASSANDRA-7979], which is to somehow quantify how
frequently columns are updated. A single decaying histogram measurement (as for the second
access in the counter and update/insert examples above) has limited value, and samples above
100 days don't impose a reasonable time skew constraint.

I'm not necessarily opposed to the first option. If someone does have a strong preference,
I think I'll defer on a patch here to someone more experienced with the read/write path. I'd
be happy to review.

> ColUpdateTimeDeltaHistogram histogram overflow
> ----------------------------------------------
>
>                 Key: CASSANDRA-11117
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11117
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Chris Lohfink
>            Assignee: Joel Knighton
>            Priority: Minor
>             Fix For: 2.2.x, 3.0.x, 3.x
>
>
> {code}
> getting attribute Mean of org.apache.cassandra.metrics:type=ColumnFamily,name=ColUpdateTimeDeltaHistogram
threw an exceptionjavax.management.RuntimeMBeanException: java.lang.IllegalStateException:
Unable to compute ceiling for max when histogram overflowed
> {code}
> Although the fact that this histogram has 164 buckets already, I wonder if there is something
weird with the computation thats causing this to be so large? It appears to be coming from
updates to system.local
> {code}
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=local,name=ColUpdateTimeDeltaHistogram
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message