cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Podkowinski (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting
Date Tue, 13 Dec 2016 10:01:10 GMT


Stefan Podkowinski commented on CASSANDRA-9625:

I think [~ruoranwang] is right by addressing the {{getEstimatedRemainingTasks}} call, as it
will delegate to the {{LeveledManifest}} version, which is synchronized and causes the reporter
thread to block. At some point the reporter must get stuck after waiting too long. I'm not
certain about the exact reasons for this, but having the reporter thread competing for compaction
locks doesn't seem like a good idea in general to me, so I'd suggest to use a cached value
of the remaining tasks count instead. This should also improve performance a bit by avoiding
continuous level size calculation on unchanged sets of sstables.


Anyone wants to give this a try by running a patched node? Test results look ok except for
the failing 2.1 {{LeveledCompactionStrategyTest.testMutateLevel}}, which always times out
but works fine locally - any idea what can be done about that? 

> GraphiteReporter not reporting
> ------------------------------
>                 Key: CASSANDRA-9625
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3
>            Reporter: Eric Evans
>            Assignee: T Jake Luciani
>         Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, thread-dump.log
> When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops working.  The
usual startup is logged, and one batch of samples is sent, but the reporting interval comes
and goes, and no other samples are ever sent.  The logs are free from errors.
> Frustratingly, metrics reporting works in our smaller (staging) environment on 2.1.6;
We are able to reproduce this on all 6 of production nodes, but not on a 3 node (otherwise
identical) staging cluster (maybe it takes a certain level of concurrency?).
> Attached is a thread dump, and our metrics.yaml.

This message was sent by Atlassian JIRA

View raw message