cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Donald Smith (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
Date Tue, 30 Dec 2014 19:37:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261238#comment-14261238
] 

Donald Smith edited comment on CASSANDRA-8245 at 12/30/14 7:36 PM:
-------------------------------------------------------------------

We're getting a similar increase in the number of pending Gossip stage tasks, followed by
OutOfMemory.  This happens once a day or so on some node of our 38 node DC.   Other nodes
have increases in pending Gossip stage tasks but they recover.   This is with C* 2.0.11. 
  We have two other DCs. ntpd is running on all nodes. But all nodes on one DC are down now.

What's odd is that the cassandra process continues running despite the OutOfMemory exception.
 You'd expect it to exit.

Prior to getting OutOfMemory, I notice that such nodes are slow in responding to commands
and queries (e.g., jmx).
{noformat}
WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip stage has 2695
pending tasks; skipping status check (no nodes will be marked down)
ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) Exception in
thread Thread[Thread-49234,5,main]
java.lang.OutOfMemoryError: Java heap space
....
ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) Exception in
thread Thread[Thread-49235,5,main]
java.lang.OutOfMemoryError: Java heap space
...
{noformat}


was (Author: thinkerfeeler):
We're getting a similar increase in the number of pending Gossip stage tasks, followed by
OutOfMemory.  This happens once a day or so on some node of our 38 node DC.   Other nodes
have increases in pending Gossip stage tasks but they recover.   This is with C* 2.0.11. 
  We have two other DCs. ntpd is running on all nodes. But all nodes on one DC are down now.

What's odd is that the cassandra process continues running despite the OutOfMemory exception.
 You'd expect it to exit.
{noformat}
WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip stage has 2695
pending tasks; skipping status check (no nodes will be marked down)
ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) Exception in
thread Thread[Thread-49234,5,main]
java.lang.OutOfMemoryError: Java heap space
....
ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) Exception in
thread Thread[Thread-49235,5,main]
java.lang.OutOfMemoryError: Java heap space
...
{noformat}

> Cassandra nodes periodically die in 2-DC configuration
> ------------------------------------------------------
>
>                 Key: CASSANDRA-8245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8245
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Scientific Linux release 6.5
> java version "1.7.0_51"
> Cassandra 2.0.9
>            Reporter: Oleg Poleshuk
>            Assignee: Brandon Williams
>            Priority: Minor
>         Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, stack5.txt
>
>
> We have 2 DCs with 3 nodes in each.
> Second DC periodically has 1-2 nodes down.
> Looks like it looses connectivity with another nodes and then Gossiper starts to accumulate
tasks until Cassandra dies with OOM.
> WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting live ratio
to maximum of 64.0 instead of Infinity
>  WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip stage has
1 pending tasks; skipping status check (no nodes will be marked down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip stage has
4 pending tasks; skipping status check (no nodes will be marked down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip stage has
8 pending tasks; skipping status check (no nodes will be marked down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip stage has
11 pending tasks; skipping status check (no nodes will be marked down)
> ...
> WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip stage has
1014764 pending tasks; skipping status check (no nodes will be marked down)
>  WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) Unexpected
exception in the selector loop.
> java.lang.OutOfMemoryError: Java heap space
> Also those lines but not sure it is relevant:
> DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) Ignoring
interval time of 2085963047



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message