cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Shuler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10787) OutOfMemoryError after few hours from node restart
Date Mon, 30 Nov 2015 18:31:10 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032219#comment-15032219
] 

Michael Shuler commented on CASSANDRA-10787:
--------------------------------------------

m1.large instances are pretty small for Cassandra. In general, I would guess that when starting
out, your cluster was not as busy as it is now, and you are seeing some limitations of the
server size at this point.

I only looked very quickly at a couple of you logs and I spotted a few pending mutations and
a good number of pending compactions being repeated in the metrics logging. I don't think
(please, correct me if I'm wrong) that m1.large offers any SSD drives, so compactions piling
up, along with slow I/O to get those operations completed, is going to result in over taxing
a limited instance like the m1.large.

> OutOfMemoryError after few hours from node restart
> --------------------------------------------------
>
>                 Key: CASSANDRA-10787
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10787
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Amazon DataStax Auto-Clustering AMI 2.6.3-1404-pv
> on 2x m1.large instances (2 vCPU, 64-bit, 7.5GB RAM, Raid0 2x420GB Disk)
> [cqlsh 5.0.1 | Cassandra 2.2.3 | CQL spec 3.3.1 | Native protocol v4]
> RF=3
>            Reporter: Piotr Westfalewicz
>            Priority: Critical
>         Attachments: case2_debuglog_head.txt, case2_debuglog_tail.txt, case2_systemlog.txt,
case3_debuglog_tail.txt, case3_systemlog_tail.txt, case4_debuglog_tail.txt, case4_systemlog.txt
>
>
> Cassandra Cluster was operating flawessly for around 3 months. Lately I've got a critical
problem with it - after few hours of running clients are disconnected permanently (that may
be Datastax C# Driver problem, though), however few more hours later (with smaller load),
on all 2 nodes there is thrown an exception (details in files):
> bq. java.lang.OutOfMemoryError: Java heap space
> Cases description:
>     Case 2 (heavy load):
>         - 2015-11-26 16:09:40,834 Restarted all nodes in cassandra cluster
> 		- 2015-11-26 17:03:46,774 First client disconnected permanently
> 		- 2015-11-26 22:17:02,327 Node shutdown
> 	Case 3 (unknown load, different node):
> 		- 2015-11-26 02:19:49,585 Node shutdown (visible only in systemlog, I don't know why
not in debug log)
> 	Case 4 (low load):
> 		- 2015-11-27 13:00:24,994 Node restart
> 		- 2015-11-27 22:26:56,131 Node shutdown
> Is that a software issue or I am using too weak Amazon instances? If so, how can the
required amount of memory be calculated?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message