cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxime Fouilleul (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-13096) Snapshots slow down jmx scrapping
Date Wed, 04 Jan 2017 17:07:58 GMT


Maxime Fouilleul updated CASSANDRA-13096:
    Attachment:     (was: Capture d’écran 2017-01-04 à 15.53.23.png)

> Snapshots slow down jmx scrapping
> ---------------------------------
>                 Key: CASSANDRA-13096
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Maxime Fouilleul
>         Attachments: CPU Load.png, Clear Snapshots.png, JMX Scrape Duration.png
> Hello,
> We are scraping the jmx metrics through a prometheus exporter and we noticed that some
nodes became really long to answer (more than 20 seconds). After some investigations we do
not find any hardware problem or overload issues on there "slow" nodes. It happens on different
clusters, some with only few giga bytes of dataset and it does not seams to be related to
a specific version neither as it happens on 2.1, 2.2 and 3.0 nodes. 
> After some unsuccessful actions, one of our ideas was to clean the snapshots staying
on one problematic node:
> {code}
> nodetool clearsnapshot
> {code}
> And the magic happens... as you can see in the attached diagrams, the second we cleared
the snapshots, the CPU activity dropped immediatly and the duration to scrape the jmx metrics
goes from +20 secs to instantaneous...
> Can you enlighten us on this issue? Once again, it appears on our three 2.1, 2.2 and
3.0 versions, on different volumetry and it is not systematically linked to the snapshots
as we have some nodes with the same snapshots volume which are going pretty well.

This message was sent by Atlassian JIRA

View raw message