cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Haddad <...@jonhaddad.com>
Subject Re: [EXTERNAL] Re: GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html
Date Mon, 21 Oct 2019 22:32:38 GMT
Others have mentioned that the data model is going to have the biggest
impact and I agree with them.  Before messing around with a bunch of OS
settings you need to have the right table structure.  There's no substitute
for it.

The biggest thing I've found that matters when tuning the OS is dialing
back read ahead (I use 4KB).  I did a few load tests for ApacheCon, here's
a before and after:

[image: image.png]

[image: image.png]

tlp-cluster comes with a bunch of dashboards that include those graphs,
they're even improved a bit since the conference.

I put together this blog post that has quick summaries of some other things
to think about when setting up a new cluster:
https://thelastpickle.com/blog/2019/01/30/new-cluster-recommendations.html

Jon


On Mon, Oct 21, 2019 at 6:22 PM Sergio <lapostadisergio@gmail.com> wrote:

> Thanks Jon!
>
> I used that tool and I did a test to compare LCS and STCS and it works
> great. However, I was referring to the JVM flags that you use since there
> are a lot of flags that I found as default and I would like to exclude the
> unused or wrong ones from the current configuration.
>
> I have also another thread opened where I am trying to figure out Kernel
> Settings for TCP
> https://lists.apache.org/thread.html/7708c22a1d95882598cbcc29bc34fa54c01fcb33c40bb616dcd3956d@%3Cuser.cassandra.apache.org%3E
>
> Do you have anything to add to that?
>
> Thanks,
>
> Sergio
>
> Il giorno lun 21 ott 2019 alle ore 15:09 Jon Haddad <jon@jonhaddad.com>
> ha scritto:
>
>> tlp-stress comes with workloads pre-baked, so there's not much
>> configuration to do.  The main flags you'll want are going to be:
>>
>> -d : duration, I highly recommend running your test for a few days
>> --compaction
>> --compression
>> -p: number of partitions
>> -r: % of reads, 0-1
>>
>> For example, you might run:
>>
>> tlp-stress run KeyValue -d 24h --compaction lcs -p 10m -r .9
>>
>> for a basic key value table, running for 24 hours, using LCS, 10 million
>> partitions, 90% reads.
>>
>> There's a lot of options. I won't list them all here, it's why I wrote
>> the manual :)
>>
>> Jon
>>
>>
>> On Mon, Oct 21, 2019 at 1:16 PM Sergio <lapostadisergio@gmail.com> wrote:
>>
>>> Thanks, guys!
>>> I just copied and paste what I found on our test machines but I can
>>> confirm that we have the same settings except for 8GB in production.
>>> I didn't select these settings and I need to verify why these settings
>>> are there.
>>> If any of you want to share your flags for a read-heavy workload it
>>> would be appreciated, so I would replace and test those flags with
>>> TLP-STRESS.
>>> I am thinking about different approaches (G1GC vs ParNew + CMS)
>>> How many GB for RAM do you dedicate to the OS in percentage or in an
>>> exact number?
>>> Can you share the flags for ParNew + CMS that I can play with it and
>>> perform a test?
>>>
>>> Best,
>>> Sergio
>>>
>>>
>>> Il giorno lun 21 ott 2019 alle ore 09:27 Reid Pinchback <
>>> rpinchback@tripadvisor.com> ha scritto:
>>>
>>>> Since the instance size is < 32gb, hopefully swap isn’t being used,
so
>>>> it should be moot.
>>>>
>>>>
>>>>
>>>> Sergio, also be aware that  -XX:+CMSClassUnloadingEnabled probably
>>>> doesn’t do anything for you.  I believe that only applies to CMS, not
>>>> G1GC.  I also wouldn’t take it as gospel truth that  -XX:+UseNUMA is a
good
>>>> thing on AWS (or anything virtualized), you’d have to run your own tests
>>>> and find out.
>>>>
>>>>
>>>>
>>>> R
>>>>
>>>> *From: *Jon Haddad <jon@jonhaddad.com>
>>>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> *Date: *Monday, October 21, 2019 at 12:06 PM
>>>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> *Subject: *Re: [EXTERNAL] Re: GC Tuning
>>>> https://thelastpickle.com/blog/2018/04/11/gc-tuning.html
>>>>
>>>>
>>>>
>>>> *Message from External Sender*
>>>>
>>>> One thing to note, if you're going to use a big heap, cap it at 31GB,
>>>> not 32.  Once you go to 32GB, you don't get to use compressed pointers [1],
>>>> so you get less addressable space than at 31GB.
>>>>
>>>>
>>>>
>>>> [1]
>>>> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.codecentric.de_en_2014_02_35gb-2Dheap-2Dless-2D32gb-2Djava-2Djvm-2Dmemory-2Doddities_&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=e9Ahs5XXRBicgUhMZQaboxsqb6jXpjvo48kEojUWaQc&s=Q7jI4ZEqVMFZIMPoSXTvMebG5fWOUJ6lhDOgWGxiHg8&e=>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 21, 2019 at 11:39 AM Durity, Sean R <
>>>> SEAN_R_DURITY@homedepot.com> wrote:
>>>>
>>>> I don’t disagree with Jon, who has all kinds of performance tuning
>>>> experience. But for ease of operation, we only use G1GC (on Java 8),
>>>> because the tuning of ParNew+CMS requires a high degree of knowledge and
>>>> very repeatable testing harnesses. It isn’t worth our time. As a previous
>>>> writer mentioned, there is usually better return on our time tuning the
>>>> schema (aka helping developers understand Cassandra’s strengths).
>>>>
>>>>
>>>>
>>>> We use 16 – 32 GB heaps, nothing smaller than that.
>>>>
>>>>
>>>>
>>>> Sean Durity
>>>>
>>>>
>>>>
>>>> *From:* Jon Haddad <jon@jonhaddad.com>
>>>> *Sent:* Monday, October 21, 2019 10:43 AM
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* [EXTERNAL] Re: GC Tuning
>>>> https://thelastpickle.com/blog/2018/04/11/gc-tuning.html
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__thelastpickle.com_blog_2018_04_11_gc-2Dtuning.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=e9Ahs5XXRBicgUhMZQaboxsqb6jXpjvo48kEojUWaQc&s=YFRUQ6Rdb5mcFf6GqguRYCsrcAcP6KzjozIgYp56riE&e=>
>>>>
>>>>
>>>>
>>>> I still use ParNew + CMS over G1GC with Java 8.  I haven't done a
>>>> comparison with JDK 11 yet, so I'm not sure if it's any better.  I've heard
>>>> it is, but I like to verify first.  The pause times with ParNew + CMS are
>>>> generally lower than G1 when tuned right, but as Chris said it can be
>>>> tricky.  If you aren't willing to spend the time understanding how it works
>>>> and why each setting matters, G1 is a better option.
>>>>
>>>>
>>>>
>>>> I wouldn't run Cassandra in production on less than 8GB of heap - I
>>>> consider it the absolute minimum.  For G1 I'd use 16GB, and never 4GB with
>>>> Cassandra unless you're rarely querying it.
>>>>
>>>>
>>>>
>>>> I typically use the following as a starting point now:
>>>>
>>>>
>>>>
>>>> ParNew + CMS
>>>>
>>>> 16GB heap
>>>>
>>>> 10GB new gen
>>>>
>>>> 2GB memtable cap, otherwise you'll spend a bunch of time copying around
>>>> memtables (cassandra.yaml)
>>>>
>>>> Max tenuring threshold: 2
>>>>
>>>> survivor ratio 6
>>>>
>>>>
>>>>
>>>> I've also done some tests with a 30GB heap, 24 GB of which was new
>>>> gen.  This worked surprisingly well in my tests since it essentially keeps
>>>> everything out of the old gen.  New gen allocations are just a pointer bump
>>>> and are pretty fast, so in my (limited) tests of this I was seeing really
>>>> good p99 times.  I was seeing a 200-400 ms pause roughly once a minute
>>>> running a workload that deliberately wasn't hitting a resource limit
>>>> (testing real world looking stress vs overwhelming the cluster).
>>>>
>>>>
>>>>
>>>> We built tlp-cluster [1] and tlp-stress [2] to help figure these things
>>>> out.
>>>>
>>>>
>>>>
>>>> [1] https://thelastpickle.com/tlp-cluster/ [thelastpickle.com]
>>>> <https://urldefense.com/v3/__https:/thelastpickle.com/tlp-cluster/__;!OYIaWQQGbnA!ZhiXAdRaL49J8nBlh0F_5MQ97Z1QNTUuTSMvksmEmxan3d65D6ATmQO1ig58W52u_EmQ1GM$>
>>>>
>>>> [2] http://thelastpickle.com/tlp-stress [thelastpickle.com]
>>>> <https://urldefense.com/v3/__http:/thelastpickle.com/tlp-stress__;!OYIaWQQGbnA!ZhiXAdRaL49J8nBlh0F_5MQ97Z1QNTUuTSMvksmEmxan3d65D6ATmQO1ig58W52uuCUZYKw$>
>>>>
>>>>
>>>>
>>>> Jon
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 21, 2019 at 10:24 AM Reid Pinchback <
>>>> rpinchback@tripadvisor.com> wrote:
>>>>
>>>> An i3x large has 30.5 gb of RAM but you’re using less than 4gb for C*.
>>>> So minus room for other uses of jvm memory and for kernel activity, that’s
>>>> about 25 gb for file cache.  You’ll have to see if you either want a bigger
>>>> heap to allow for less frequent gc cycles, or you could save money on the
>>>> instance size.  C* generates a lot of medium-length lifetime objects which
>>>> can easily end up in old gen.  A larger heap will reduce the burn of more
>>>> old-gen collections.  There are no magic numbers to just give because it’ll
>>>> depend on your usage patterns.
>>>>
>>>>
>>>>
>>>> *From: *Sergio <lapostadisergio@gmail.com>
>>>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> *Date: *Sunday, October 20, 2019 at 2:51 PM
>>>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> *Subject: *Re: GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html
>>>> [thelastpickle.com]
>>>> <https://urldefense.com/v3/__https:/thelastpickle.com/blog/2018/04/11/gc-tuning.html__;!OYIaWQQGbnA!ZhiXAdRaL49J8nBlh0F_5MQ97Z1QNTUuTSMvksmEmxan3d65D6ATmQO1ig58W52uwG_KUYM$>
>>>>
>>>>
>>>>
>>>> *Message from External Sender*
>>>>
>>>> Thanks for the answer.
>>>>
>>>> This is the JVM version that I have right now.
>>>>
>>>> openjdk version "1.8.0_161"
>>>> OpenJDK Runtime Environment (build 1.8.0_161-b14)
>>>> OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)
>>>>
>>>> These are the current flags. Would you change anything in a i3x.large
>>>> aws node?
>>>>
>>>> java -Xloggc:/var/log/cassandra/gc.log
>>>> -Dcassandra.max_queued_native_transport_requests=4096 -ea
>>>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
>>>> -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003
>>>> -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB
>>>> -XX:+UseNUMA -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true
>>>> -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:+UseG1GC
>>>> -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=200
>>>> -XX:InitiatingHeapOccupancyPercent=45 -XX:G1HeapRegionSize=0
>>>> -XX:-ParallelRefProcEnabled -Xms3821M -Xmx3821M
>>>> -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler
>>>> -Dcom.sun.management.jmxremote.port=7199
>>>> -Dcom.sun.management.jmxremote.rmi.port=7199
>>>> -Dcom.sun.management.jmxremote.ssl=false
>>>> -Dcom.sun.management.jmxremote.authenticate=false
>>>> -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/conf/jmxremote.password
>>>> -Dcom.sun.management.jmxremote.access.file=/etc/cassandra/conf/jmxremote.access
>>>> -Djava.library.path=/usr/share/cassandra/lib/sigar-bin
>>>> -Djava.rmi.server.hostname=172.24.150.141 -XX:+CMSClassUnloadingEnabled
>>>> -javaagent:/usr/share/cassandra/lib/jmx_prometheus_javaagent-0.3.1.jar=10100:/etc/cassandra/default.conf/jmx-export.yml
>>>> -Dlogback.configurationFile=logback.xml
>>>> -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir=
>>>> -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid
>>>> -Dcassandra-foreground=yes -cp
>>>> /etc/cassandra/conf:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/asm-5.0.4.jar:/usr/share/cassandra/lib/caffeine-2.2.6.jar:/usr/share/cassandra/lib/cassandra-driver-core-3.0.1-shaded.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.9.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/concurrent-trees-2.4.0.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/ecj-4.4.2.jar:/usr/share/cassandra/lib/guava-18.0.jar:/usr/share/cassandra/lib/HdrHistogram-2.1.9.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/hppc-0.5.4.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.13.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/cassandra/lib/jamm-0.3.0.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jcl-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/jctools-core-1.2.1.jar:/usr/share/cassandra/lib/jflex-1.6.0.jar:/usr/share/cassandra/lib/jmx_prometheus_javaagent-0.3.1.jar:/usr/share/cassandra/lib/jna-4.2.2.jar:/usr/share/cassandra/lib/joda-time-2.4.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/jstackjunit-0.0.1.jar:/usr/share/cassandra/lib/libthrift-0.9.2.jar:/usr/share/cassandra/lib/log4j-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/logback-classic-1.1.3.jar:/usr/share/cassandra/lib/logback-core-1.1.3.jar:/usr/share/cassandra/lib/lz4-1.3.0.jar:/usr/share/cassandra/lib/metrics-core-3.1.5.jar:/usr/share/cassandra/lib/metrics-jvm-3.1.5.jar:/usr/share/cassandra/lib/metrics-logback-3.1.5.jar:/usr/share/cassandra/lib/netty-all-4.0.44.Final.jar:/usr/share/cassandra/lib/ohc-core-0.4.4.jar:/usr/share/cassandra/lib/ohc-core-j8-0.4.4.jar:/usr/share/cassandra/lib/reporter-config3-3.0.3.jar:/usr/share/cassandra/lib/reporter-config-base-3.0.3.jar:/usr/share/cassandra/lib/sigar-1.6.4.jar:/usr/share/cassandra/lib/slf4j-api-1.7.7.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.1.1.7.jar:/usr/share/cassandra/lib/snowball-stemmer-1.3.0.581.1.jar:/usr/share/cassandra/lib/ST4-4.0.8.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-3.11.3.jar:/usr/share/cassandra/apache-cassandra-thrift-3.11.3.jar:/usr/share/cassandra/stress.jar:
>>>> org.apache.cassandra.service.CassandraDaemon
>>>>
>>>> Best,
>>>>
>>>> Sergio
>>>>
>>>>
>>>>
>>>> Il giorno sab 19 ott 2019 alle ore 14:30 Chris Lohfink <
>>>> clohfink85@gmail.com> ha scritto:
>>>>
>>>> "It depends" on your version and heap size but G1 is easier to get
>>>> right so probably wanna stick with that unless you are using small heaps
or
>>>> really interested in tuning it (likely for massively smaller gains then
>>>> tuning your data model). There is no GC algo that is strictly better than
>>>> others in all scenarios unfortunately. If your JVM supports it, ZGC or
>>>> Shenandoah are likely going to give you the best latencies.
>>>>
>>>>
>>>>
>>>> Chris
>>>>
>>>>
>>>>
>>>> On Fri, Oct 18, 2019 at 8:41 PM Sergio Bilello <
>>>> lapostadisergio@gmail.com> wrote:
>>>>
>>>> Hello!
>>>>
>>>> Is it still better to use ParNew + CMS Is it still better than G1GC
>>>> these days?
>>>>
>>>> Any recommendation for i3.xlarge nodes read-heavy workload?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Sergio
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>>
>>>> The information in this Internet Email is confidential and may be
>>>> legally privileged. It is intended solely for the addressee. Access to this
>>>> Email by anyone else is unauthorized. If you are not the intended
>>>> recipient, any disclosure, copying, distribution or any action taken or
>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>> When addressed to our clients any opinions or advice contained in this
>>>> Email are subject to the terms and conditions expressed in any applicable
>>>> governing The Home Depot terms of business or client engagement letter. The
>>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>>> content of this attachment and for any damages or losses arising from any
>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>> items of a destructive nature, which may be contained in this attachment
>>>> and shall not be liable for direct, indirect, consequential or special
>>>> damages in connection with this e-mail message or its attachment.
>>>>
>>>>

Mime
View raw message