cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Griffith (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-11751) Histogram overflow in metrics
Date Wed, 11 May 2016 12:59:13 GMT
Jeff Griffith created CASSANDRA-11751:
-----------------------------------------

             Summary: Histogram overflow in metrics
                 Key: CASSANDRA-11751
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11751
             Project: Cassandra
          Issue Type: Bug
          Components: Core
         Environment: Cassandra 2.2.6 on Linux
            Reporter: Jeff Griffith


One particular histogram in the cassandra metrics seems to overflow preventing the calculation
of the mean on the dropwizard "Snapshot". Here is the exception that comes from the metrics
library:

{code}
java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed
        at org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:232)
~[apache-cassandra-2.2.6.jar:2.2.6-SNAPSHOT]
        at org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103)
~[apache-cassandra-2.2.6.jar:2.2.6-SNAPSHOT]
        at com.addthis.metrics3.reporter.config.SplunkReporter.reportHistogram(SplunkReporter.java:155)
~[reporter-config3-3.0.0.jar:3.0.0]
        at com.addthis.metrics3.reporter.config.SplunkReporter.report(SplunkReporter.java:101)
~[reporter-config3-3.0.0.jar:3.0.0]
        at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) ~[metrics-core-3.1.0.jar:3.1.0]
        at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) ~[metrics-core-3.1.0.jar:3.1.0]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_72]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_72]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_72]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[na:1.8.0_72]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_72]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_72]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
{code}

On deeper analysis, it seems like this is happening specifically on this metric:
{code}
ColUpdateTimeDeltaHistogram
{code}

I think this is where it is updated in ColumnFamilyStore.java
{code}
    public void apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater
indexer, OpOrder.Group opGroup, ReplayPosition replayPosition)
    {
        long start = System.nanoTime();
        Memtable mt = data.getMemtableFor(opGroup, replayPosition);
        final long timeDelta = mt.put(key, columnFamily, indexer, opGroup);
        maybeUpdateRowCache(key);
        metric.samplers.get(Sampler.WRITES).addSample(key.getKey(), key.hashCode(), 1);
        metric.writeLatency.addNano(System.nanoTime() - start);
        if(timeDelta < Long.MAX_VALUE)
            metric.colUpdateTimeDeltaHistogram.update(timeDelta);
    }
{code}

Considering it's calculating a mean, i don't know if perhaps a large sum might be overflowing?
But that "if (timeDelta < Long.MAX_VALUE)" looks suspect, doesn't it?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message