Think of GB to OS as something intended to support file caching.  As such the amount is whatever suits your usage.  If your use is almost exclusively reading, then file cache memory doesn’t matter that much if you’re operating with your storage as those nvme ssd drives that the i3’s come with.  There is already a chunk cache that you should be tuning in C* instead, and feeding fast from the O/S file cache, assuming compressed SSTables, maybe turns out to be less of a concern.

 

If you have moderate write activity then your situation changes because then that same file cache is how your dirty background pages turn into eventual flushes to disk, and so you have to watch the impact of read stalls when the I/O fills with write requests.  You might not see this so obviously on nvme drives, but that could depend a lot on the distro and kernels and how the filesystem is mounted. 

 

My super strong advice on issues like this is to not cargo-cult other people’s tunings.  Look at them for ideas, sure. But learn how to do your own investigations, and budget the time for it into your project.  Budget a LOT of time for it if your measure of “good performance” is based on latency; when “good” is defined in terms of throughput your life is easier.  Also, everything is always a little different in virtualization, and lord knows you can have screwball things appear in AWS. The good news is you don’t need a perfect configuration out of the gate; you need a configuration you understand and can refine; understanding comes from knowing how to do your own performance monitoring.

 

 

From: Sergio <lapostadisergio@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, October 21, 2019 at 1:16 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: [EXTERNAL] Re: GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

 

Message from External Sender

Thanks, guys!
I just copied and paste what I found on our test machines but I can confirm that we have the same settings except for 8GB in production.
I didn't select these settings and I need to verify why these settings are there.
If any of you want to share your flags for a read-heavy workload it would be appreciated, so I would replace and test those flags with TLP-STRESS.
I am thinking about different approaches (G1GC vs ParNew + CMS) 
How many GB for RAM do you dedicate to the OS in percentage or in an exact number?
Can you share the flags for ParNew + CMS that I can play with it and perform a test?

Best,
Sergio

 

Il giorno lun 21 ott 2019 alle ore 09:27 Reid Pinchback <rpinchback@tripadvisor.com> ha scritto:

Since the instance size is < 32gb, hopefully swap isn’t being used, so it should be moot.

 

Sergio, also be aware that  -XX:+CMSClassUnloadingEnabled probably doesn’t do anything for you.  I believe that only applies to CMS, not G1GC.  I also wouldn’t take it as gospel truth that  -XX:+UseNUMA is a good thing on AWS (or anything virtualized), you’d have to run your own tests and find out.

 

R

From: Jon Haddad <jon@jonhaddad.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, October 21, 2019 at 12:06 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: [EXTERNAL] Re: GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

 

Message from External Sender

One thing to note, if you're going to use a big heap, cap it at 31GB, not 32.  Once you go to 32GB, you don't get to use compressed pointers [1], so you get less addressable space than at 31GB.

 

 

On Mon, Oct 21, 2019 at 11:39 AM Durity, Sean R <SEAN_R_DURITY@homedepot.com> wrote:

I don’t disagree with Jon, who has all kinds of performance tuning experience. But for ease of operation, we only use G1GC (on Java 8), because the tuning of ParNew+CMS requires a high degree of knowledge and very repeatable testing harnesses. It isn’t worth our time. As a previous writer mentioned, there is usually better return on our time tuning the schema (aka helping developers understand Cassandra’s strengths).

 

We use 16 – 32 GB heaps, nothing smaller than that.

 

Sean Durity

 

From: Jon Haddad <jon@jonhaddad.com>
Sent: Monday, October 21, 2019 10:43 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

 

I still use ParNew + CMS over G1GC with Java 8.  I haven't done a comparison with JDK 11 yet, so I'm not sure if it's any better.  I've heard it is, but I like to verify first.  The pause times with ParNew + CMS are generally lower than G1 when tuned right, but as Chris said it can be tricky.  If you aren't willing to spend the time understanding how it works and why each setting matters, G1 is a better option.  

 

I wouldn't run Cassandra in production on less than 8GB of heap - I consider it the absolute minimum.  For G1 I'd use 16GB, and never 4GB with Cassandra unless you're rarely querying it.  

 

I typically use the following as a starting point now:

 

ParNew + CMS

16GB heap

10GB new gen

2GB memtable cap, otherwise you'll spend a bunch of time copying around memtables (cassandra.yaml)

Max tenuring threshold: 2

survivor ratio 6

 

I've also done some tests with a 30GB heap, 24 GB of which was new gen.  This worked surprisingly well in my tests since it essentially keeps everything out of the old gen.  New gen allocations are just a pointer bump and are pretty fast, so in my (limited) tests of this I was seeing really good p99 times.  I was seeing a 200-400 ms pause roughly once a minute running a workload that deliberately wasn't hitting a resource limit (testing real world looking stress vs overwhelming the cluster).

 

We built tlp-cluster [1] and tlp-stress [2] to help figure these things out. 

 

[2] http://thelastpickle.com/tlp-stress [thelastpickle.com]

 

Jon

 

 

 

 

On Mon, Oct 21, 2019 at 10:24 AM Reid Pinchback <rpinchback@tripadvisor.com> wrote:

An i3x large has 30.5 gb of RAM but you’re using less than 4gb for C*.  So minus room for other uses of jvm memory and for kernel activity, that’s about 25 gb for file cache.  You’ll have to see if you either want a bigger heap to allow for less frequent gc cycles, or you could save money on the instance size.  C* generates a lot of medium-length lifetime objects which can easily end up in old gen.  A larger heap will reduce the burn of more old-gen collections.  There are no magic numbers to just give because it’ll depend on your usage patterns.

 

 

Message from External Sender

Thanks for the answer.

This is the JVM version that I have right now.

openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

These are the current flags. Would you change anything in a i3x.large aws node? 

java -Xloggc:/var/log/cassandra/gc.log -Dcassandra.max_queued_native_transport_requests=4096 -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=45 -XX:G1HeapRegionSize=0 -XX:-ParallelRefProcEnabled -Xms3821M -Xmx3821M -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/conf/jmxremote.password -Dcom.sun.management.jmxremote.access.file=/etc/cassandra/conf/jmxremote.access -Djava.library.path=/usr/share/cassandra/lib/sigar-bin -Djava.rmi.server.hostname=172.24.150.141 -XX:+CMSClassUnloadingEnabled -javaagent:/usr/share/cassandra/lib/jmx_prometheus_javaagent-0.3.1.jar=10100:/etc/cassandra/default.conf/jmx-export.yml -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -Dcassandra-foreground=yes -cp /etc/cassandra/conf:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/asm-5.0.4.jar:/usr/share/cassandra/lib/caffeine-2.2.6.jar:/usr/share/cassandra/lib/cassandra-driver-core-3.0.1-shaded.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.9.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/concurrent-trees-2.4.0.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/ecj-4.4.2.jar:/usr/share/cassandra/lib/guava-18.0.jar:/usr/share/cassandra/lib/HdrHistogram-2.1.9.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/hppc-0.5.4.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.13.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/cassandra/lib/jamm-0.3.0.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jcl-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/jctools-core-1.2.1.jar:/usr/share/cassandra/lib/jflex-1.6.0.jar:/usr/share/cassandra/lib/jmx_prometheus_javaagent-0.3.1.jar:/usr/share/cassandra/lib/jna-4.2.2.jar:/usr/share/cassandra/lib/joda-time-2.4.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/jstackjunit-0.0.1.jar:/usr/share/cassandra/lib/libthrift-0.9.2.jar:/usr/share/cassandra/lib/log4j-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/logback-classic-1.1.3.jar:/usr/share/cassandra/lib/logback-core-1.1.3.jar:/usr/share/cassandra/lib/lz4-1.3.0.jar:/usr/share/cassandra/lib/metrics-core-3.1.5.jar:/usr/share/cassandra/lib/metrics-jvm-3.1.5.jar:/usr/share/cassandra/lib/metrics-logback-3.1.5.jar:/usr/share/cassandra/lib/netty-all-4.0.44.Final.jar:/usr/share/cassandra/lib/ohc-core-0.4.4.jar:/usr/share/cassandra/lib/ohc-core-j8-0.4.4.jar:/usr/share/cassandra/lib/reporter-config3-3.0.3.jar:/usr/share/cassandra/lib/reporter-config-base-3.0.3.jar:/usr/share/cassandra/lib/sigar-1.6.4.jar:/usr/share/cassandra/lib/slf4j-api-1.7.7.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.1.1.7.jar:/usr/share/cassandra/lib/snowball-stemmer-1.3.0.581.1.jar:/usr/share/cassandra/lib/ST4-4.0.8.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-3.11.3.jar:/usr/share/cassandra/apache-cassandra-thrift-3.11.3.jar:/usr/share/cassandra/stress.jar: org.apache.cassandra.service.CassandraDaemon

Best,

Sergio

 

Il giorno sab 19 ott 2019 alle ore 14:30 Chris Lohfink <clohfink85@gmail.com> ha scritto:

"It depends" on your version and heap size but G1 is easier to get right so probably wanna stick with that unless you are using small heaps or really interested in tuning it (likely for massively smaller gains then tuning your data model). There is no GC algo that is strictly better than others in all scenarios unfortunately. If your JVM supports it, ZGC or Shenandoah are likely going to give you the best latencies.

 

Chris

 

On Fri, Oct 18, 2019 at 8:41 PM Sergio Bilello <lapostadisergio@gmail.com> wrote:

Hello!

Is it still better to use ParNew + CMS Is it still better than G1GC  these days?

Any recommendation for i3.xlarge nodes read-heavy workload?


Thanks,

Sergio

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

 



The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.