cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lohfink (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-13687) Abnormal heap growth and long GC during repair.
Date Wed, 12 Jul 2017 06:28:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083511#comment-16083511
] 

Chris Lohfink edited comment on CASSANDRA-13687 at 7/12/17 6:27 AM:
--------------------------------------------------------------------

Can you include {{nodetool cfstats}} and {{nodetool netstats}} of a node exhibiting this?
Large partitions (maximum compressed partition size in cfstats) and excessive streaming is
very expensive with this version, and if you have these (environmental/schema related) you
can resolve it with a larger heap or addressing your data model. It could be related to incremental
repairs being a bit behind as well if havent run it before or for awhile.


was (Author: cnlwsu):
Can you include {{nodetool cfstats}} and {{nodetool netstats}} of a node exhibiting this?
Large partitions (maximum compressed partition size in cfstats) and excessive streaming is
very expensive with this version, and if you have these (environmental/schema related) you
can resolve it with a larger heap or addressing your data model.

> Abnormal heap growth and long GC during repair.
> -----------------------------------------------
>
>                 Key: CASSANDRA-13687
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13687
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stanislav Vishnevskiy
>         Attachments: 3.0.14.png, 3.0.9.png
>
>
> We recently upgraded from 3.0.9 to 3.0.14 to get the fix from CASSANDRA-13004
> Sadly 3 out of the last 7 nights we have had to wake up due Cassandra dying on us. We
currently don't have any data to help reproduce this, but maybe since there aren't many commits
between the 2 versions it might be obvious.
> Basically we trigger a parallel incremental repair from a single node every night at
1AM. That node will sometimes start allocating a lot and keeping the heap maxed and triggering
GC. Some of these GC can last up to 2 minutes. This effectively destroys the whole cluster
due to timeouts to this node.
> The only solution we currently have is to drain the node and restart the repair, it has
worked fine the second time every time.
> I attached heap charts from 3.0.9 and 3.0.14 during repair.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message