cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Steinmaurer (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18
Date Tue, 26 Sep 2017 13:19:00 GMT


Thomas Steinmaurer commented on CASSANDRA-13900:

[~jjordan], I was pointed to CASSANDRA-12269 a few minutes ago, which sounds a lot what we
are facing. A pity that this isn't considered for 3.0.x.

2.1 => 3.11 sounds like a huge step to me. I haven't checked if this is even possible from
a SSTable format perspective.

> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> -------------------------------------------------------------------
>                 Key: CASSANDRA-13900
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Thomas Steinmaurer
>            Priority: Blocker
>         Attachments: cassandra2118_vs_3014.jpg, cassandra3014_jfr_5min.jpg
> In short: After upgrading to 3.0.14 (from 2.1.18), we aren't able to process the same
incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using Cassandra as backend.
Both, loadtest and production is hosted in AWS and do have the same spec on the Cassandra-side,
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> * Client requests are entirely CQL
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster AVG with
constant, simulated load running against our cluster, using Cassandra 2.1 for > 2 years
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, and basically,
3.0.14 isn't able to cope with the load anymore. No particular special tweaks, memory settings/changes
etc., all the same as in 2.1.18. We also didn't upgrade sstables yet, thus the increase mentioned
in the screenshot is not related to any manually triggered maintenance operation after upgrading
to 3.0.14.
> According to our monitoring, with 3.0.14, we see a *GC suspension time increase by a
factor of > 2*, of course directly correlating with an CPU increase > 80%. See: attached
screen "cassandra2118_vs_3014.jpg"
> This all means that our incoming load against 2.1.18 is something, 3.0.14 can't handle.
So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or scale out for being
able to handle the same load, which is cost-wise not an option.
> Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the mentioned load,
but can provide JFR session for our current 3.0.14 setup. The attached 5min JFR memory allocation
area (cassandra3014_jfr_5min.jpg) shows compaction being the top contributor for the captured
5min time-frame. Could be by "accident" covering the 5min with compaction as top contributor
only (although mentioned simulated client load is attached), but according to stack traces,
we see new classes from 3.0, e.g. BTreeRow.searchIterator() etc. popping up as top contributor,
thus possibly new classes / data structures are causing much more object churn now.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message