incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piavlo <lolitus...@gmail.com>
Subject heap issues - looking for advices on gc tuning
Date Wed, 30 Oct 2013 00:26:14 GMT
Hi,

Below I try to give a full picture to the problem I'm facing.

This is a 12 node cluster, running on ec2 with m2.xlarge instances (17G 
ram , 2 cpus).
Cassandra version is 1.0.8
Cluster normally having between 3000 - 1500 reads per second (depends on 
time of the day) and 1700 - 800 writes per second- according to Opscetner.
RF=3, now row caches are used.

Memory relevant  configs from cassandra.yaml:
flush_largest_memtables_at: 0.85
reduce_cache_sizes_at: 0.90
reduce_cache_capacity_to: 0.75
commitlog_total_space_in_mb: 4096

relevant JVM options used are:
-Xms8000M -Xmx8000M -Xmn400M
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly"

Now what happens is that with these settings after cassandra process 
restart, the GC it working fine at the beginning, and heap used looks like a
saw with perfect teeth, eventually the teeth size start to diminish 
until the teeth become not noticable, and then cassandra starts to spend 
lot's of CPU time
doing gc. It takes about 2 weeks until for such cycle , and then I need 
to restart cassandra process to improve performance.
During all this time there are no memory  related messages in cassandra 
system.log, except a "GC for ParNew: little above 200ms" once in a while.

Things i've already done trying to reduce this eventual heap pressure.
1) reducing bloom_filter_fp_chance  resulting in reduction from ~700MB 
to ~280MB total per node based on all Filter.db files on the node.
2) reducing key cache sizes, and dropping key_caches for CFs which do no 
not have many reads
3) the heap size was increased from 7000M to 8000M
All these have not really helped , just the increase from 7000M to 
8000M, helped in increase the cycle till excessive gc from ~9 days to 
~14 days.

I've tried to graph overtime the data that is supposed to be in heap vs 
actual heap size, by summing up all CFs bloom filter sizes + all CFs key 
cache capacities multipled by average key size + all CFs memtables data 
size reported (i've overestimated the data size a bit on purpose to be 
on the safe size).
Here is a link to graph showing last 2 day metrics for a node which 
could not effectively do GC, and then cassandra process was restarted.
http://awesomescreenshot.com/0401w5y534
You can clearly see that before and after restart, the size of data that 
is in supposed to be in heap, is the same pretty much the same,
which makes me think that I really need is GC tunning.

Also I suppose that this is not due to number of total keys each node 
has , which is between 300 - 200 milions keys for all CF key estimates 
summed on a code.
The nodes have datasize between 75G to 45G  accordingly to milions of 
keys. And all nodes are starting to have having GC heavy load after 
about 14 days.
Also the excessive GC and heap usage are not affected by load which 
varies depending on time of the day (see read/write rates at the 
beginning of the mail).
So again based on this , I assume this is not due to large number of 
keys or too much load on the cluster,  but due to a pure GC 
misconfiguration issue.

Things I remember that I've tried for GC tunning:
1) Changing -XX:MaxTenuringThreshold=1 to values like 8 - did not help.
2) Adding  -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing 
-XX:CMSIncrementalDutyCycleMin=0
                   -XX:CMSIncrementalDutyCycle=10 
-XX:ParallelGCThreads=2 JVM_OPTS -XX:ParallelCMSThreads=1
     this actually made things worse.
3) Adding -XX:-XX-UseAdaptiveSizePolicy -XX:SurvivorRatio=8 - did not help.

Also since it takes like 2 weeks to verify that changing GC setting did 
not help, the process is painfully slow to try all the possibilities :)
I'd highly appreciate any help and hints on the GC tunning.

tnx
Alex







Mime
View raw message