Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Tue, 30 Dec 2014 19:37:13 +0000 (UTC)
From: "Donald Smith (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12752436.1415025713000.116785.1419968233770@Atlassian.JIRA>
In-Reply-To: <JIRA.12752436.1415025713000@Atlassian.JIRA>
References: <JIRA.12752436.1415025713000@Atlassian.JIRA>
 <JIRA.12752436.1415025713091@arcas>
Subject: [jira] [Comment Edited] (CASSANDRA-8245) Cassandra nodes
 periodically die in 2-DC configuration
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261238#comment-14261238 ] 

Donald Smith edited comment on CASSANDRA-8245 at 12/30/14 7:36 PM:
-------------------------------------------------------------------

We're getting a similar increase in the number of pending Gossip stage tasks, followed by OutOfMemory.  This happens once a day or so on some node of our 38 node DC.   Other nodes have increases in pending Gossip stage tasks but they recover.   This is with C* 2.0.11.    We have two other DCs. ntpd is running on all nodes. But all nodes on one DC are down now.

What's odd is that the cassandra process continues running despite the OutOfMemory exception.  You'd expect it to exit.

Prior to getting OutOfMemory, I notice that such nodes are slow in responding to commands and queries (e.g., jmx).
{noformat}
WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip stage has 2695 pending tasks; skipping status check (no nodes will be marked down)
ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-49234,5,main]
java.lang.OutOfMemoryError: Java heap space
....
ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-49235,5,main]
java.lang.OutOfMemoryError: Java heap space
...
{noformat}


was (Author: thinkerfeeler):
We're getting a similar increase in the number of pending Gossip stage tasks, followed by OutOfMemory.  This happens once a day or so on some node of our 38 node DC.   Other nodes have increases in pending Gossip stage tasks but they recover.   This is with C* 2.0.11.    We have two other DCs. ntpd is running on all nodes. But all nodes on one DC are down now.

What's odd is that the cassandra process continues running despite the OutOfMemory exception.  You'd expect it to exit.
{noformat}
WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip stage has 2695 pending tasks; skipping status check (no nodes will be marked down)
ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-49234,5,main]
java.lang.OutOfMemoryError: Java heap space
....
ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-49235,5,main]
java.lang.OutOfMemoryError: Java heap space
...
{noformat}

> Cassandra nodes periodically die in 2-DC configuration
> ------------------------------------------------------
>
>                 Key: CASSANDRA-8245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8245
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Scientific Linux release 6.5
> java version "1.7.0_51"
> Cassandra 2.0.9
>            Reporter: Oleg Poleshuk
>            Assignee: Brandon Williams
>            Priority: Minor
>         Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, stack5.txt
>
>
> We have 2 DCs with 3 nodes in each.
> Second DC periodically has 1-2 nodes down.
> Looks like it looses connectivity with another nodes and then Gossiper starts to accumulate tasks until Cassandra dies with OOM.
> WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting live ratio to maximum of 64.0 instead of Infinity
>  WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip stage has 1 pending tasks; skipping status check (no nodes will be marked down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip stage has 4 pending tasks; skipping status check (no nodes will be marked down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip stage has 8 pending tasks; skipping status check (no nodes will be marked down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip stage has 11 pending tasks; skipping status check (no nodes will be marked down)
> ...
> WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip stage has 1014764 pending tasks; skipping status check (no nodes will be marked down)
>  WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) Unexpected exception in the selector loop.
> java.lang.OutOfMemoryError: Java heap space
> Also those lines but not sure it is relevant:
> DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) Ignoring interval time of 2085963047


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)