cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <jeff.ji...@crowdstrike.com>
Subject Re: Lot of GC on two nodes out of 7
Date Wed, 02 Mar 2016 07:39:26 GMT
Compaction falling behind will likely cause additional work on reads (more sstables to merge),
but I’d be surprised if it manifested in super long GC. When you say twice as many sstables,
how many is that?. 

In cfstats, does anything stand out? Is max row size on those nodes larger than on other nodes?

What you don’t show in your JVM options is the new gen size – if you do have unusually
large partitions on those two nodes (especially likely if you have rf=2 – if you have rf=3,
then there’s probably a third node misbehaving you haven’t found yet), then raising new
gen size can help handle the garbage created by reading large partitions without having to
tolerate the promotion. Estimates for the amount of garbage vary, but it could be “gigabytes”
of garbage on a very wide partition (see https://issues.apache.org/jira/browse/CASSANDRA-9754
for work in progress to help mitigate that type of pain).

- Jeff 

From:  Anishek Agarwal
Reply-To:  "user@cassandra.apache.org"
Date:  Tuesday, March 1, 2016 at 11:12 PM
To:  "user@cassandra.apache.org"
Subject:  Lot of GC on two nodes out of 7

Hello, 

we have a cassandra cluster of 7 nodes, all of them have the same JVM GC configurations, all
our writes /  reads use the TokenAware Policy wrapping a DCAware policy. All nodes are part
of same Datacenter.

We are seeing that two nodes are having high GC collection times. Then mostly seem to spend
time in GC like about 300-600 ms. This also seems to result in higher CPU utilisation on these
machines. Other  5 nodes don't have this problem.

There is no additional repair activity going on the cluster, we are not sure why this is happening.

we checked cfhistograms on the two CF we have in the cluster and number of reads seems to
be almost same. 

we also used cfstats to see the number of ssttables on each node and one of the nodes with
the above problem has twice the number of ssttables than other nodes. This still doesnot explain
why two nodes have high GC Overheads. our GC config is as below:
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"

JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"

JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"

JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"

JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=50"

JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"

JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"

JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"

JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"

JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"

JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"

JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48"

JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48"

JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent"

JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"

JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"

JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"

# earlier value 131072 = 32768 * 4

JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=131072"

JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600"

JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32768"

JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32768"

#new 

JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"


We are using cassandra 2.0.17. If anyone has any suggestion as to how what else we can look
for to understand why this is happening please do reply. 



Thanks
anishek




Mime
View raw message