cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
Date Mon, 29 Jun 2015 18:25:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606066#comment-14606066
] 

Benedict edited comment on CASSANDRA-9681 at 6/29/15 6:24 PM:
--------------------------------------------------------------

Thanks. It is likely the log files will be insufficient to diagnose, though, just to let you
know. Assuming that's the case, the best next step is to obtain a heap dump during one of
the spikes (doesn't need to be at the peak, just so long as it's well above where it was settled
prior to upgrade). In the meantime I'll see if I can find a candidate by looking through recent
changes.


was (Author: benedict):
Thanks. It is likely the log files will be insufficient to diagnose, though, just to let you
know. Assuming that's the case, theybest next step is to obtain a heap dump during one of
the spikes (doesn't need to be at the peak, just so long as it's well above where it was settled
prior to upgrade). In the meantime I'll see if I can find a candidate by looking through recent
changes.

> Memtable heap size grows and many long GC pauses are triggered
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-9681
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: C* 2.1.7, Debian Wheezy
>            Reporter: mlowicki
>            Assignee: Benedict
>            Priority: Critical
>             Fix For: 2.1.x
>
>         Attachments: cassandra.yaml
>
>
> C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}}
jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0)
on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using
default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}})
> Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0)
> After restarting all nodes is behaves stable for 1-2days. Today I've done that and long
GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0).
The only pattern we've found so far is that long GC  pauses are happening basically at the
same time on all nodes in the same data center - even on the ones where memtables heap size
is not growing.
> Cliffs on the graphs are nodes restarts.
> Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level -
https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0.
> Replication factor is set to 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message