cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuki Morishita (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap due to long GC pause
Date Wed, 07 Oct 2015 23:23:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947793#comment-14947793
] 

Yuki Morishita edited comment on CASSANDRA-10449 at 10/7/15 11:23 PM:
----------------------------------------------------------------------

There are couples of things going on.

{code}
ERROR [StreamReceiveTask:29] 2015-10-05 14:46:17,090 CassandraDaemon.java:223 - Exception
in thread Thread[StreamReceiveTask:29,5,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException:
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
{code}

When rebuilding secondary index after receiving files, bootstrapping node is experiencing
TombstoneOverwhelmingException.
This can make streaming to hang, as it never completes the receiving task.
Streaming should tolerate secondary index build failure, instead of failing entire stream
session, it should just warn user and go on, so that user can manually trigger secondary index
rebuild later.

I'm not sure the above relates to OOM. From StatusLogger, FlushWriter task is glowing and
that is the cause of OOM.
If you can capture stack using jstack, that would be greate help.

-I create separate JIRA for the former.- Created CASSANDRA-10474.


was (Author: yukim):
There are couples of things going on.

{code}
ERROR [StreamReceiveTask:29] 2015-10-05 14:46:17,090 CassandraDaemon.java:223 - Exception
in thread Thread[StreamReceiveTask:29,5,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException:
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
{code}

When rebuilding secondary index after receiving files, bootstrapping node is experiencing
TombstoneOverwhelmingException.
This can make streaming to hang, as it never completes the receiving task.
Streaming should tolerate secondary index build failure, instead of failing entire stream
session, it should just warn user and go on, so that user can manually trigger secondary index
rebuild later.

I'm not sure the above relates to OOM. From StatusLogger, FlushWriter task is glowing and
that is the cause of OOM.
If you can capture stack using jstack, that would be greate help.

I create separate JIRA for the former.

> OOM on bootstrap due to long GC pause
> -------------------------------------
>
>                 Key: CASSANDRA-10449
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu 14.04, AWS
>            Reporter: Robbie Strickland
>              Labels: gc
>             Fix For: 2.1.x
>
>         Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 500-700GB per
node.  SSTable counts are <10 per table.  I am attempting to provision additional nodes,
but bootstrapping OOMs every time after about 10 hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old Generation
GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 CassandraDaemon.java:223 - Exception
in thread Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message