cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tommy Stendahl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13886) OOM put node in limbo
Date Tue, 26 Sep 2017 11:47:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180640#comment-16180640
] 

Tommy Stendahl commented on CASSANDRA-13886:
--------------------------------------------

I have done some work on this issue, even if it happens very seldomly its very bad when it
happens. Since the JVM doesn’t die properly our monitoring system doesn’t restart Cassandra
on this node, it requires a manual intervention. The work around with {{-XX:+ExitOnOutOfMemoryError}}
works fine, and you can also use {{-XX:+CrashOnOutOfMemoryError}} if you want core dumps.
But as I understand these options are only available from java 8u92 so they might not be an
option for every one. I think an alternative is to improve {{HeapUtils.generateHeapDump()}}
so we catch {{Throwable}} so we prevent any exceptions from leaking out from {{HeapUtils.generateHeapDump()}},
this would allow execution to continue in {{JVMStabilityInspector.inspectThrowable()}} until
we reach {{killer.killCurrentJVM(t)}} that will properly kill the jvm.
I have prepared a patch for this on the 2.2 branch but it should merge fine to all branches.

> OOM put node in limbo
> ---------------------
>
>                 Key: CASSANDRA-13886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13886
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra version 2.2.10
>            Reporter: Marcus Olsson
>            Assignee: Tommy Stendahl
>            Priority: Minor
>              Labels: lhf
>
> In one of our test clusters we have had some issues with OOM. While working on fixing
this it was discovered that one of the nodes that got OOM actually wasn't shut down properly.
Instead it went into a half-up-state where the affected node considered itself up while all
other nodes considered it as down.
> The following stacktrace was observed which seems to be the cause of this:
> {noformat}
> java.lang.NoClassDefFoundError: Could not initialize class java.lang.UNIXProcess
>         at java.lang.ProcessImpl.start(ProcessImpl.java:130) ~[na:1.8.0_131]
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ~[na:1.8.0_131]
>         at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_131]
>         at java.lang.Runtime.exec(Runtime.java:485) ~[na:1.8.0_131]
>         at org.apache.cassandra.utils.HeapUtils.generateHeapDump(HeapUtils.java:88) ~[apache-cassandra-2.2.10.jar:2.2.10]
>         at org.apache.cassandra.utils.JVMStabilityInspector.inspectThrowable(JVMStabilityInspector.java:56)
~[apache-cassandra-2.2.10.jar:2.2.10]
>         at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:168)
~[apache-cassandra-2.2.10.jar:2.2.10]
>         at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
~[apache-cassandra-2.2.10.jar:2.2.10]
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) ~[apache-cassandra-2.2.10.jar:2.2.10]
>         at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
> {noformat}
> It seems that if an unexpected exception/error is thrown inside JVMStabilityInspector.inspectThrowable
the JVM is not actually shut down but instead keeps on running. My expectation is that the
JVM should shut down in case OOM is thrown.
> Potential workaround is to add:
> {noformat}
> JVM_OPTS="$JVM_OPTS -XX:+ExitOnOutOfMemoryError"
> {noformat}
> to cassandra-env.sh.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message