cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anuj Wadehra <anujw_2...@yahoo.co.in>
Subject Re: Handle Write Heavy Loads in Cassandra 2.0.3
Date Mon, 20 Apr 2015 18:21:03 GMT
Small correction: we are making writes in 5 cf an reading frm one at high speeds. 



Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:"Anuj Wadehra" <anujw_2003@yahoo.co.in>
Date:Mon, 20 Apr, 2015 at 7:53 pm
Subject:Handle Write Heavy Loads in Cassandra 2.0.3

Hi, 
 
Recently, we discovered that  millions of mutations were getting dropped on our cluster.
Eventually, we solved this problem by increasing the value of memtable_flush_writers from
1 to 3. We usually write 3 CFs simultaneously an one of them has 4 Secondary Indexes. 
 
New changes also include: 
concurrent_compactors: 12 (earlier it was default) 
compaction_throughput_mb_per_sec: 32(earlier it was default) 
in_memory_compaction_limit_in_mb: 400 ((earlier it was default 64) 
memtable_flush_writers: 3 (earlier 1) 
 
After, making above changes, our write heavy workload scenarios started giving "promotion
failed" exceptions in  gc logs. 
 
We have done JVM tuning and Cassandra config changes to solve this: 
 
MAX_HEAP_SIZE="12G" (Increased Heap to from 8G to reduce fragmentation) 
HEAP_NEWSIZE="3G" 
 
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2" (We observed that even at SurvivorRatio=4, our survivor
space was getting 100% utilized under heavy write load and we thought that minor collections
were directly promoting objects to Tenured generation) 
 
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=20" (Lots of objects were moving from Eden to
Tenured on each minor collection..may be related to medium life objects related to Memtables
and compactions as suggested by heapdump) 
 
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=20" 
JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions" 
JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity" 
JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs" 
JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768" 
JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark" 
JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=30000" 
JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=2000" //though it's default value 
JVM_OPTS="$JVM_OPTS -XX:+CMSEdenChunksRecordAlways" 
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled" 
JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking" 
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70" (to avoid concurrent failures we
reduced value) 
 
Cassandra config: 
compaction_throughput_mb_per_sec: 24 
memtable_total_space_in_mb: 1000 (to make memtable flush frequent.default is 1/4 heap which
creates more long lived objects) 
 
Questions: 
1. Why increasing memtable_flush_writers and in_memory_compaction_limit_in_mb caused promotion
failures in JVM? Does more memtable_flush_writers mean more memtables in memory? 
2. Still, objects are getting promoted at high speed to Tenured space. CMS is running on Old
gen every 4-5 minutes  under heavy write load. Around 750+ minor collections of upto 300ms
happened in 45 mins. Do you see any problems with new JVM tuning and Cassandra config? Is
the justification given against those changes sounds logical? Any suggestions? 
3. What is the best practice for reducing heap fragmentation/promotion failure when allocation
and promotion rates are high? 
 
Thanks 
Anuj 
 
 



Mime
View raw message