cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13006) Disable automatic heap dumps on OOM error
Date Fri, 06 Oct 2017 19:47:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195149#comment-16195149
] 

Benjamin Lerer commented on CASSANDRA-13006:
--------------------------------------------


[~urandom], [~brandon.williams], [~tjake]

Sorry, for the delay.

I pushed some patches for [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...blerer:13006-2.2],
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...blerer:13006-3.0],  [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...blerer:13006-3.11]
and [trunk|https://github.com/apache/cassandra/compare/trunk...blerer:13006-trunk].

The branches differs only at the level of the configuration files ({{cassandra-env.sh}} and
{{cassandra-env.ps1}}). 

The patches let the JVM handle the {{OutOfMemoryErrors}} throught the JVM {{OnOutOfMemoryError}},
{{ExitOnOutOfMemoryError}} or {{CrashOnOutOfMemoryError}} options.
As the {{ExitOnOutOfMemoryError}} and {{CrashOnOutOfMemoryError}} options are only supported
since Oracle JDK 7 update 101 and since JDK 8 update 92, Cassandra uses by default the {{OnOutOfMemoryError}}
option.

A startup check will emit a warning if none of the options is used. This check is there to
ensure that {{OOM}} errors are properly handled and that C* cannot continue to run in an unstable
state that could cause data corruption.
The patch add no check for the {{HeapDumpOnOutOfMemoryError}} option as in some cases administrators
prefer to disable them.

The {{cassandra-env.sh}} has a new variable {{JVM_ON_OUT_OF_MEMORY_ERROR_OPT}} which should
be use to specify the {{OnOutOfMemoryError}} option. As bash commands split words on white
spaces without taking quotes into account, specifying the {{OnOutOfMemoryError}} as part of
the {{JVM_OPTS}} variable cannot work for an option value such as: {{kill -9 %p}}.

Before generating an heap dump, C* use to log an Heap histogram using {{jmap}}. If the heap
size was large, reading the heap dump could take a few hours and the heap histogram can help
to debug the problem much faster. The patches keep the posibility to print an heap histogram
on OOM error but disables it by default. To enable it the {{cassandra.printHeapHistogramOnOutOfMemoryError}}
system property must be set to {{true}}.
As generating the histogram for only the live objects (using {{jmap histo:live}}) would trigger
a garbage collection before generating the histogram, I prefered to stick with {{jmap histo}}
to minimize the risks.  

The previous implementation was suffering of 2 problems:
* If several OOM errors were thrown in a short time span, each of them would trigger an heap
histogram and an heap dump (see this [comment|https://issues.apache.org/jira/browse/CASSANDRA-13006?focusedCommentId=16118421&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16118421])
* If an exception was thrown while C* was trying to generate the heap dump, C* would not be
shutdown and would continue running in an unstable state (see CASSANDRA-13886)

The patches fix those problems for the case were an heap histogram need to be logged. In the
case were the histogram is not requested those problems do not exist anymore.

CI looks good for the unit tests. 
The changes to the {{cassandra}} startup script break the DTests but before changing the {{DTests}}
framework I would prefer having a first review of the patches.

[~JoshuaMcKenzie] could you review the patches? 

> Disable automatic heap dumps on OOM error
> -----------------------------------------
>
>                 Key: CASSANDRA-13006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13006
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Configuration
>            Reporter: anmols
>            Assignee: Benjamin Lerer
>            Priority: Minor
>             Fix For: 3.0.15
>
>         Attachments: 13006-3.0.9.txt
>
>
> With CASSANDRA-9861, a change was added to enable collecting heap dumps by default if
the process encountered an OOM error. These heap dumps are stored in the Apache Cassandra
home directory unless configured otherwise (see [Cassandra Support Document|https://support.datastax.com/hc/en-us/articles/204225959-Generating-and-Analyzing-Heap-Dumps]
for this feature).
>  
> The creation and storage of heap dumps aides debugging and investigative workflows, but
is not be desirable for a production environment where these heap dumps may occupy a large
amount of disk space and require manual intervention for cleanups. 
>  
> Managing heap dumps on out of memory errors and configuring the paths for these heap
dumps are available as JVM options in JVM. The current behavior conflicts with the Boolean
JVM flag HeapDumpOnOutOfMemoryError. 
>  
> A patch can be proposed here that would make the heap dump on OOM error honor the HeapDumpOnOutOfMemoryError flag.
Users who would want to still generate heap dumps on OOM errors can set the -XX:+HeapDumpOnOutOfMemoryError JVM
option.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message