cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peng Xiao" <2535...@qq.com>
Subject 回复: gc causes C* node hang
Date Thu, 30 Nov 2017 00:38:33 GMT
looks we are not able to enable –XX:PrintSafepointStatisticsCount=1
in cassandra-env.sh
Could anyone please advise?


CompilerOracle: inline org/apache/cassandra/io/util/Memory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/io/util/SafeMemory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare (Ljava/nio/ByteBuffer;[B)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare ([BLjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo
(Ljava/lang/Object;JILjava/lang/Object;JI)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo
(Ljava/lang/Object;JILjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo
(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
Error: Could not find or load main class –XX:PrintSafepointStatisticsCount=1



Thanks
------------------ 原始邮件 ------------------
发件人: "我自己的邮箱";<2535053@qq.com>;
发送时间: 2017年11月24日(星期五) 上午6:17
收件人: "user"<user@cassandra.apache.org>;

主题: 回复: gc causes C* node hang



Thanks Chris for the thorough explanation,actually we are using ssd,we will try to check the
hardware .And all the 7 vm node is in the same machine,but we did not find any errors from
vmware logs.




------------------ 原始邮件 ------------------
发件人: "clohfink85";<clohfink85@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨5:21
收件人: "user"<user@cassandra.apache.org>;

主题: Re: gc causes C* node hang



sorry also:–XX:PrintSafepointStatisticsCount=1



On Thu, Nov 23, 2017 at 3:20 PM, Chris Lohfink <clohfink85@gmail.com> wrote:
The only pause over 1s you had was 
2017-11-23T17:50:14.573+0800: 1378060.385: Total time for which application threads were stopped:
37.0282783 seconds, Stopping threads took: 36.9420759 seconds

This is not actually a GC pause, its likely that it was actually revoking bias or something
completely unrelated to GCs even. "Stopping threads took: 36.9420759 seconds" means it took
37 seconds for the threads to reach a safepoint once jvm wanted to stop the world. My knee
jerk reaction to this is "hardware", as I've mostly seen it when fsyncing the hprof statistics
or something and blocking on a slow disk. if you want to know more specific you can enable
some safepoint logging but i would recommend checking your disks or just replacing the host
if able to. Can do analysis after its not impacting you.


for info safepoint logging (might not be super helpful, but if you really need this hardware
and need to dig into whats causing JVM to hang up):


-XX:+UnlockDiagnosticVMOptions
-XX:+PrintSafepointStatistics
-XX:+LogVMOutput
-XX:LogFile=somelocation.log



Chris




On Thu, Nov 23, 2017 at 3:09 PM, Peng Xiao <2535053@qq.com> wrote:


Hi Chris,


I found the gc log in another node which we enable the gc log.
Could you please take a look?


Thanks,
Peng Xiao






------------------ 原始邮件 ------------------
发件人: "Chris Lohfink";<clohfink85@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:57
收件人: "user"<user@cassandra.apache.org>;

主题: Re: gc causes C* node hang





If it sets it to 8, you shouldn't override it to 5.

You should enable gc logging for what its worth. Its very very cheap and provides a lot of
useful information when you need it.


Chris


On Thu, Nov 23, 2017 at 2:54 PM, Peng Xiao <2535053@qq.com> wrote:
We only have 7 cores per node.
For XX:ParallelGCThreads ,looks By default, Hotspot caps GC threads at 8,
maybe we need to remove this?




------------------ 原始邮件 ------------------
发件人: "我自己的邮箱";<2535053@qq.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:51
收件人: "user"<user@cassandra.apache.org>;

主题: 回复: gc causes C* node hang



Thanks Chris.we don't have gclogs for this node.we will try to add XX:G1ReservePercent=25.




------------------ 原始邮件 ------------------
发件人: "Chris Lohfink";<clohfink85@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:46
收件人: "user"<user@cassandra.apache.org>;

主题: Re: gc causes C* node hang



Can you include output from the gc logs on the 30ms pause? If you dont have gclogs, enable
it and collect one. G1 provides good details and can catch some edge cases with usecase.

I would guess since its so long you didnt have enough to-space. can try adding -XX:G1ReservePercent=25
(or -XX:-G1UseAdaptiveIHOP and -XX:InitiatingHeapOccupancyPercent) and increasing heap space
if you can.

JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"

how many cpu cores do you have? Make sure your not setting these lower than default. ( check
with `java -XX:+PrintFlagsFinal 2>&1 | grep Threads`)
 

Looks like 16gb heap?  how much space is available on the host (how big can you set it)? swap
disabled?

If its not to-space exhausted issue, gc logs will help.

Chris






On Thu, Nov 23, 2017 at 12:49 AM, Peng Xiao <2535053@qq.com> wrote:
Hi there,


We have a cluster with two DCs with 2.1.13,sometimes the gc will cause one node hang,and the
application rt will jump to 15s,actually even we have one node down,the rt will not fluctuates
violently. 
We are using Cassandra G1 with the following configuration:


JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"


JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"



Could anyone please advise?





Thanks,
Peng Xiao

















 ---------------------------------------------------------------------
 To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
 For additional commands, e-mail: user-help@cassandra.apache.org
Mime
View raw message