cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jonathan lacefield (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
Date Thu, 18 Dec 2014 15:10:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251759#comment-14251759
] 

jonathan lacefield commented on CASSANDRA-8447:
-----------------------------------------------

Patch from CASSANDRA-8485 resolved this issue.  8 hour stress test preformed well and showed
a steady state JVM (saw tooth pattern for heap GC) with compaction enabled.  Will close this
as resolved.

> Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8447
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cluster size - 4 nodes
> Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data -
10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives)
> OS - RHEL 6.5
> jvm - oracle 1.7.0_71
> Cassandra version 2.0.11
>            Reporter: jonathan lacefield
>         Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml,
gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.1.svg, output.2.svg, output.svg,
results.tar.gz, visualvm_screenshot
>
>
> Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full
Old Gen heap which is not cleared during CMS GC.
> Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on
1 node.  Executed different Cassandra stress loads, using write only operations.  Monitored
visualvm and jconsole for heap pressure.  Captured iostat and dstat for most tests.  Captured
heap dump from 50 thread load.  Hints were disabled for testing on all nodes to alleviate
GC noise due to hints backing up.
> Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write n=1900000000
-rate threads=<different threads tested> -schema  replication\(factor=3\)  keyspace="Keyspace1"
-node <all nodes listed>
> Data load thread count and results:
> * 1 thread - Still running but looks like the node can sustain this load (approx 500
writes per second per node)
> * 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS measured in the
60 second range (approx 2k writes per second per node)
> * 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS measured in the
60 second range
> * 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS measured in the
60 second range  (approx 10k writes per second per node)
> * 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS measured in
the 60 second range  (approx 20k writes per second per node)
> * 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS measured in
the 60 second range  (approx 25k writes per second per node)
> Note - the observed behavior was the same for all tests except for the single threaded
test.  The single threaded test does not appear to show this behavior.
> Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads.
 
> JVM settings tested:
> #  default, out of the box, env-sh settings
> #  10 G Max | 1 G New - default env-sh settings
> #  10 G Max | 1 G New - default env-sh settings
> #* JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50"
> #   20 G Max | 10 G New 
>    JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>    JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>    JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>    JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>    JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=8"
>    JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>    JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>    JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>    JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
>    JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000"
>    JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=30000"
>    JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=12"
>    JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=12"
>    JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
>    JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
>    JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
>    JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768"
>    JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
> # 20 G Max | 1 G New 
>    JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>    JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>    JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>    JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>    JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=8"
>    JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>    JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>    JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>    JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
>    JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000"
>    JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=30000"
>    JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=12"
>    JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=12"
>    JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
>    JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
>    JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
>    JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768"
>    JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
> Linux OS settings tested:
> # Disabled Transparent Huge Pages
> echo never > /sys/kernel/mm/transparent_hugepage/enabled
> echo never > /sys/kernel/mm/transparent_hugepage/defrag
> # Enabled Huge Pages
> echo 21500000000 > /proc/sys/kernel/shmmax (over 20GB for heap)
> echo 1536 > /proc/sys/vm/nr_hugepages (20GB/2MB page size)
> # Disabled NUMA
> numa-off in /etc/grub.confdatastax
> # Verified all settings documented here were implemented
>   http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html
> Attachments:
> #  .yaml
> #  fio output - results.tar.gz
> #  50 thread heap dump - https://drive.google.com/a/datastax.com/file/d/0B4Imdpu2YrEbMGpCZW5ta2liQ2c/view?usp=sharing
> #  100 thread - visual vm anonymous screenshot - visualvm_screenshot
> #  dstat screen shot of with compaction - Node_with_compaction.png
> #  dstat screen shot of without compaction -- Node_without_compaction.png
> #  gcinspector messages from system.log
> # gc.log output - gc.logs.tar.gz
> Observations:
> #  even though this is a spinning disk implementation, disk io looks good. 
> #* output from Jshook perf monitor https://github.com/jshook/perfscripts is attached
> #* note, we leveraged direct io for all tests by adding direct=1 to the .global config
files
> #  cpu usage is moderate until large GC events occur
> #  once old gen heap fills up and cannot clean, memtable post flushers start to back
up (show a lot pending) via tpstats
> #  the node itself, i.e. ssh, is still responsive but the Cassandra instance becomes
unresponsive
> # once old gen heap fills up Cassandra stress starts to throw CL ONE errors stating there
aren't enough replicas to satisfy....
> #  heap dump from 50 thread, JVM scenario 1 is attached
> #* appears to show a compaction thread consuming a lot of memory
> #  sample system.log output for gc issues
> #  strace -e futex -p $PID -f -c output during 100 thread load and during old gen "filling",
just before full
> % time    seconds  usecs/call    calls    errors syscall
> 100.00  244.886766        4992    49052      7507 futex
> 100.00  244.886766                49052      7507 total
> #  htop during full gc cycle  - https://s3.amazonaws.com/uploads.hipchat.com/6528/480117/4ZlgcoNScb6kRM2/upload.png
> #  nothing is blocked via tpstats on these nodes
> #  compaction does have pending tasks, upwards of 20, on the nodes
> #  Nodes without compaction achieved approximately 20k writes per second per node without
errors or drops
> Next Steps:
> #  Will try to create a flame graph and update load here - http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html
> #  Will try to recreate in another environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message