cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steinmaurer, Thomas" <thomas.steinmau...@dynatrace.com>
Subject RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)
Date Tue, 26 Sep 2017 09:36:47 GMT
Hi Alex,

we tested with larger new gen sizes up to ¼ of max heap, but m4.xlarge look like being to
weak to deal with larger new gen. The result was that we then got much more GCInspector related
logs, but perhaps we need to re-test.

Right, we are using batches extensively. Unlogged/non-atomic. We are aware of avoiding multi
partition batches, if possible. For test purposes we built something into our application
to switch a flag to move from multi partition batches to strictly single partition per batch.
We have not seen any measurable high-level improvement (e.g. decreased CPU, GC suspension
…) on the Cassandra-side with single partition batches. Naturally, this resulted in much
more requests executed by our application against the Cassandra cluster, with the affect in
our application/server, that we saw a significant GC/CPU increase on our server, caused by
the DataStax driver due to executing now more requests by a factor of X. So, with no visible
gain on the Cassandra-side, but impacting our application/server negatively, we don’t strictly
execute single partition batches.

As said on the ticket (https://issues.apache.org/jira/browse/CASSANDRA-13900), anything except
Cassandra binaries have been unchanged in our loadtest environment.


Thanks,
Thomas



From: Alexander Dejanovski [mailto:alex@thelastpickle.com]
Sent: Dienstag, 26. September 2017 11:14
To: user@cassandra.apache.org
Subject: Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Hi Thomas,

I wouldn't move to G1GC with small heaps (<24GB) but just looking at your ticket I think
that your new gen is way too small.
I get that it worked better in 2.1 in your case though, which would suggest that the memory
footprint is different between 2.1 and 3.0. It looks like you're using batches extensively.
Hopefully you're aware that multi partition batches are discouraged because they indeed create
heap pressure and high coordination costs (on top of batchlog writes/deletions), leading to
more GC pauses.
With a 400MB new gen, you're very likely to have a lot of premature promotions (especially
with the default max tenuring threshold), which will fill the old gen faster than necessary
and is likely to trigger major GCs.

I'd suggest you re-run those tests with a 2GB new gen and compare results. Know that with
Cassandra you can easily go up to 40%-50% of your heap for the new gen.

Cheers,


On Tue, Sep 26, 2017 at 10:58 AM Matope Ono <matope.ono@gmail.com<mailto:matope.ono@gmail.com>>
wrote:
Hi. We met similar situation after upgrading from 2.1.14 to 3.11 in our production.

Have you already tried G1GC instead of CMS? Our timeouts were mitigated after replacing CMS
with G1GC.

Thanks.

2017-09-25 20:01 GMT+09:00 Steinmaurer, Thomas <thomas.steinmaurer@dynatrace.com<mailto:thomas.steinmaurer@dynatrace.com>>:
Hello,

I have now some concrete numbers from our 9 node loadtest cluster with constant load, same
infrastructure after upgrading to 3.0.14 from 2.1.18.

We see doubled GC suspension time + correlating CPU increase. In short, 3.0.14 is not able
to handle the same load.

I have created https://issues.apache.org/jira/browse/CASSANDRA-13900. Feel free to request
any further additional information on the ticket.

Unfortunately this is a real show-stopper for us upgrading to 3.0.

Thanks for your attention.

Thomas

From: Steinmaurer, Thomas [mailto:thomas.steinmaurer@dynatrace.com<mailto:thomas.steinmaurer@dynatrace.com>]
Sent: Freitag, 15. September 2017 13:51
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Hi Jeff,

we are using native (CQL3) via Java DataStax driver (3.1). We also have OpsCenter running
(to be removed soon) via Thrift, if I remember correctly.

As said, the write request latency for our keyspace hasn’t really changed, so perhaps another
one (system related, OpsCenter …?) is affected or perhaps the JMX metric is reporting something
differently now. ☺ So not a real issue for now hopefully, just popping up in our monitoring,
wondering what this may be.

Regarding compression metadata memory usage drop. Right, storage engine re-write could be
a reason. Thanks.

Still wondering about the GC/CPU increase.

Thanks!

Thomas



From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Freitag, 15. September 2017 13:14
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Most people find 3.0 slightly slower than 2.1. The only thing that really stands out in your
email is the huge change in 95% latency - that's atypical. Are you using thrift or native
9042)?  The decrease in compression metadata offheap usage is likely due to the increased
storage efficiency of the storage engine (see Cassandra-8099).


--
Jeff Jirsa


On Sep 15, 2017, at 2:37 AM, Steinmaurer, Thomas <thomas.steinmaurer@dynatrace.com<mailto:thomas.steinmaurer@dynatrace.com>>
wrote:
Hello,

we have a test (regression) environment hosted in AWS, which is used for auto deploying our
software on a daily basis and attach constant load across all deployments. Basically to allow
us to detect any regressions in our software on a daily basis.

On the Cassandra-side, this is single-node in AWS, m4.xlarge, EBS gp2, 8G heap, CMS. The environment
has also been upgraded from Cassandra 2.1.18 to 3.0.14 at a certain point in time. Without
running upgradesstables so far. We have not made any additional JVM/GC configuration change
when going from 2.1.18 to 3.0.14 on our own, thus, any self-made configuration changes (e.g.
new gen heap size) for 2.1.18 are also in place with 3.0.14.

What we see after a time-frame of ~ 7 days (so, e.g. should not be caused by some sort of
spiky compaction pattern) is an AVG increase in GC/CPU (most likely correlating):

•         CPU: ~ 12% => ~ 17%

•         GC Suspension: ~ 1,7% => 3,29%

In this environment not a big deal, but relatively we have a CPU increase of ~ 50% (with increased
GC most likely contributing). Something we have deal with when going into production (going
into larger, multi-node loadtest environments first though).

Beside the CPU/GC shift, we also monitor the following noticeable changes (don’t know if
they somehow correlate with the CPU/GC shift above):

•         Increased AVG Write Client Requests Latency (95th Percentile), org.apache.cassandra.metrics.ClientRequest.Latency.Write:
6,05ms => 29,2ms, but almost constant (no change in) write client request latency for our
particular keyspace only, org.apache.cassandra.metrics.Keyspace.ruxitdb.WriteLatency

•         Compression metadata memory usage drop, org.apache.cassandra.metrics.Keyspace.XXX.
CompressionMetadataOffHeapMemoryUsed: ~218MB => ~105MB => Good or bad? Known?

I know, looks all a bit vague, but perhaps someone else has seen something similar when upgrading
to 3.0.14 and can share their thoughts/ideas. Especially the (relative) CPU/GC increase is
something we are curious about.

Thanks a lot.

Thomas
The contents of this e-mail are intended for the named addressee only. It contains information
that may be confidential. Unless you are the named addressee or an authorized designee, you
may not copy or use it, or disclose it to anyone else. If you received it in error please
notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN
91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria,
Freistädterstraße 313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
The contents of this e-mail are intended for the named addressee only. It contains information
that may be confidential. Unless you are the named addressee or an authorized designee, you
may not copy or use it, or disclose it to anyone else. If you received it in error please
notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN
91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria,
Freistädterstraße 313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
The contents of this e-mail are intended for the named addressee only. It contains information
that may be confidential. Unless you are the named addressee or an authorized designee, you
may not copy or use it, or disclose it to anyone else. If you received it in error please
notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN
91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria,
Freistädterstraße 313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>

--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com<http://www.thelastpickle.com/>
The contents of this e-mail are intended for the named addressee only. It contains information
that may be confidential. Unless you are the named addressee or an authorized designee, you
may not copy or use it, or disclose it to anyone else. If you received it in error please
notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN
91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria,
Freistädterstraße 313
Mime
View raw message