cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Steinmaurer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18
Date Mon, 25 Sep 2017 10:45:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Steinmaurer updated CASSANDRA-13900:
-------------------------------------------
    Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the same incoming
write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using Cassandra as backend.
Both, loadtest and production is hosted in AWS and do have the same spec on the Cassandra-side,
namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster AVG with constant,
simulated load running against our cluster, using Cassandra 2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, and basically,
3.0.14 isn't able to cope with the load anymore. No particular special tweaks, memory settings/changes
etc., all the same as in 2.1.8. We also didn't upgrade sstables yet, thus the increase mentioned
below is not related to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase by a factor
of > 2, of course directly correlating with an CPU increase > 80%.
!!
!cassandra2.1.8_vs_3.0.14.png|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is something, 3.0.14
can't handle. So, we would need to either scale up (e.g. to m4.2xlarge) or scale out for being
able to handle the same load.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the same incoming
write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using Cassandra as backend.
Both, loadtest and production is hosted in AWS and do have the same spec on the Cassandra-side,
namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster AVG with constant,
simulated load running against our cluster, using Cassandra 2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, and basically,
3.0.14 isn't able to cope with the load anymore. No particular special tweaks, memory settings/changes
etc., all the same as in 2.1.8. We also didn't upgrade sstables yet, thus the increase mentioned
below is not related to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase by a factor
of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2.1.8_vs_3.0.14.png|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is something, 3.0.14
can't handle. So, we would need to either scale up (e.g. to m4.2xlarge) or scale out for being
able to handle the same load.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-13900
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Thomas Steinmaurer
>            Priority: Blocker
>         Attachments: cassandra2.1.18_vs_3.0.14.png
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the same incoming
write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using Cassandra as backend.
Both, loadtest and production is hosted in AWS and do have the same spec on the Cassandra-side,
namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster AVG with
constant, simulated load running against our cluster, using Cassandra 2.1 for > 2 years
now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, and basically,
3.0.14 isn't able to cope with the load anymore. No particular special tweaks, memory settings/changes
etc., all the same as in 2.1.8. We also didn't upgrade sstables yet, thus the increase mentioned
below is not related to any manually triggered maintenance operation after upgrading to 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time increase by a factor
of > 2, of course directly correlating with an CPU increase > 80%.
> !!
> !cassandra2.1.8_vs_3.0.14.png|thumbnail!
> This all means that our incoming load for several weeks now against 2.1.18 is something,
3.0.14 can't handle. So, we would need to either scale up (e.g. to m4.2xlarge) or scale out
for being able to handle the same load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message