cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thakrar, Jayesh" <>
Subject Re: Cassandra crashes....
Date Tue, 22 Aug 2017 15:39:28 GMT
Surbhi and Fay,

I agree we have plenty of RAM to spare.
However, our data load and compaction churn is so high (partially thanks to SSDs!), its causing
too much GC pressure.
And as you know the Edenspace and survivor space cleanup is a STW - hence larger heap will
increase the gc pauses.

As for "what happens" during the crash - nothing.
It seems that the daemon just dies silently.

If you are interested, attached are the Cassandra system.log and the detailed gc log files.

system.log = Cassandra log (see line 424 - it’s the last line before the crash)

cassandra-gc.log.8.currrent = last gc log at the time of crash
Cassandra-gc.log.0 = gc log after startup

If you want compare the "gc pauses" grep the gc files for the word "stopped"
(e.g. grep stopped cassandra-gc.log.*)

Thanks for the quick replies!


From: Surbhi Gupta <>
Date: Tuesday, August 22, 2017 at 10:19 AM
To: "Thakrar, Jayesh" <>, "" <>
Subject: Re: Cassandra crashes....

16GB heap is too small for G1GC . Try at least 32GB of heap size
On Tue, Aug 22, 2017 at 7:58 AM Fay Hou [Storage Service] ­ <<>>
What errors do you see?
16gb of 256 GB . Heap is too small. I would give heap at least 160gb.

On Aug 22, 2017 7:42 AM, "Thakrar, Jayesh" <<>>

Hi All,

We are somewhat new users to Cassandra 3.10 on Linux and wanted to ping the user group for
their experiences.

Our usage profile is  batch jobs that load millions of rows to Cassandra every hour.

And there are similar period batch jobs that read millions of rows and do some processing,
outputting the result to HDFS (no issues with HDFS).

We often seen Cassandra daemons crash.

Key points of our environment are:

Pretty good servers: 54 cores (with hyperthreading), 256 GB RAM, 3.2 TB SSD drive

Compaction: TWCS compaction with 7 day windows as the data retention period is limited - about
120 days.

JDK: Java and G1 GC

Heap Size: 16 GB

Large SSTables: 50 GB to 300+ GB

We see the daemons crash after some back-to-back long GCs (1.5 to 3.5 seconds).

Note that we had set the target for GC pauses to be 200 ms

We have been somewhat able to tame the crashes by updating the TWCS compaction properties

to have min/max compaction sstables = 4 and by drastically reducing the size of the New/Eden
space (to 5% of heap space = 800 MB).

Its been about 12 hours and our stop-the-world gc pauses are under 90 ms.

Since the servers have more than sufficient resources, we are not seeing any noticeable performance

Is this kind of tuning normal/expected?



View raw message