Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D9A05200BDA for ; Tue, 13 Dec 2016 11:01:12 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id D8675160B15; Tue, 13 Dec 2016 10:01:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 25A46160B23 for ; Tue, 13 Dec 2016 11:01:11 +0100 (CET) Received: (qmail 61582 invoked by uid 500); 13 Dec 2016 10:01:11 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 61561 invoked by uid 99); 13 Dec 2016 10:01:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Dec 2016 10:01:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id CA1702C03FF for ; Tue, 13 Dec 2016 10:01:10 +0000 (UTC) Date: Tue, 13 Dec 2016 10:01:10 +0000 (UTC) From: "Stefan Podkowinski (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 13 Dec 2016 10:01:13 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744732#comment-15744732 ] Stefan Podkowinski commented on CASSANDRA-9625: ----------------------------------------------- I think [~ruoranwang] is right by addressing the {{getEstimatedRemainingTasks}} call, as it will delegate to the {{LeveledManifest}} version, which is synchronized and causes the reporter thread to block. At some point the reporter must get stuck after waiting too long. I'm not certain about the exact reasons for this, but having the reporter thread competing for compaction locks doesn't seem like a good idea in general to me, so I'd suggest to use a cached value of the remaining tasks count instead. This should also improve performance a bit by avoiding continuous level size calculation on unchanged sets of sstables. ||2.1||2.2||3.0||3.x|| |[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9625-2.1]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9625-2.2]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9625-3.0]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9625-3.x]| |[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-3.x-dtest/]| |[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-3.x-testall/]| Anyone wants to give this a try by running a patched node? Test results look ok except for the failing 2.1 {{LeveledCompactionStrategyTest.testMutateLevel}}, which always times out but works fine locally - any idea what can be done about that? > GraphiteReporter not reporting > ------------------------------ > > Key: CASSANDRA-9625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 > Project: Cassandra > Issue Type: Bug > Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 > Reporter: Eric Evans > Assignee: T Jake Luciani > Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, thread-dump.log > > > When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops working. The usual startup is logged, and one batch of samples is sent, but the reporting interval comes and goes, and no other samples are ever sent. The logs are free from errors. > Frustratingly, metrics reporting works in our smaller (staging) environment on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not on a 3 node (otherwise identical) staging cluster (maybe it takes a certain level of concurrency?). > Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)