Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 24C43200BF3 for ; Thu, 5 Jan 2017 11:00:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 2356C160B27; Thu, 5 Jan 2017 10:00:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6F2DD160B26 for ; Thu, 5 Jan 2017 11:00:04 +0100 (CET) Received: (qmail 65982 invoked by uid 500); 5 Jan 2017 09:59:58 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 65747 invoked by uid 99); 5 Jan 2017 09:59:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jan 2017 09:59:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6FD3A2C2A68 for ; Thu, 5 Jan 2017 09:59:58 +0000 (UTC) Date: Thu, 5 Jan 2017 09:59:58 +0000 (UTC) From: "Stefan Podkowinski (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-13096) Snapshots slow down jmx scrapping MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 05 Jan 2017 10:00:05 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800953#comment-15800953 ] Stefan Podkowinski commented on CASSANDRA-13096: ------------------------------------------------ I think you should at least limit scraping to metric mbeans by having a whitelist as in [this|https://github.com/prometheus/jmx_exporter#configuration] example. Constantly polling every attribute of all mbeans sounds like trouble to me, as even reading non-metric mbeans may have side effects. > Snapshots slow down jmx scrapping > --------------------------------- > > Key: CASSANDRA-13096 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13096 > Project: Cassandra > Issue Type: Bug > Reporter: Maxime Fouilleul > Attachments: CPU Load.png, Clear Snapshots.png, JMX Scrape Duration.png > > > Hello, > We are scraping the jmx metrics through a prometheus exporter and we noticed that some nodes became really long to answer (more than 20 seconds). After some investigations we do not find any hardware problem or overload issues on there "slow" nodes. It happens on different clusters, some with only few giga bytes of dataset and it does not seams to be related to a specific version neither as it happens on 2.1, 2.2 and 3.0 nodes. > After some unsuccessful actions, one of our ideas was to clean the snapshots staying on one problematic node: > {code} > nodetool clearsnapshot > {code} > And the magic happens... as you can see in the attached diagrams, the second we cleared the snapshots, the CPU activity dropped immediatly and the duration to scrape the jmx metrics goes from +20 secs to instantaneous... > Can you enlighten us on this issue? Once again, it appears on our three 2.1, 2.2 and 3.0 versions, on different volumetry and it is not systematically linked to the snapshots as we have some nodes with the same snapshots volume which are going pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)