cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Piotr Westfalewicz (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10787) OutOfMemoryError after few hours from node restart
Date Thu, 17 Dec 2015 09:40:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061795#comment-15061795
] 

Piotr Westfalewicz edited comment on CASSANDRA-10787 at 12/17/15 9:40 AM:
--------------------------------------------------------------------------

Hey guys,

Here is the continuation of the story:
0. Taking your advice, I've decided to create more powerful cluster
1. I've created a new cluster, on 2x m1.xlarge instances (4 vCPU, 64-bit, 15GB RAM, Raid0
4x420GB HDD Disk), and changed RF to 2
2. Took the snapshot of the data (keyspace.table=logs.group) on one of the old nodes
3. scp between old node and the new node. From snapshot to the cassandra/data/kayspace/tablename
folder
4. Loaded the data without restarting the server - by nodetool refresh
5. Triggered nodetool repair

After few hours - the server went down. I've attached the logs. This is the case 5 files.
Maybe that's because of the size of sstables? In my case one of them was around 50GB.

I've also migrated the rest of the data (not "big" logs.group, but another tables from 500M
to 5GB) the same way and the server was working fine and the data was accessible.


was (Author: piotrwest):
Hey guys,

Here is the continuation of the story:
0. Taking your advice, I've decided to create more powerful cluster
1. I've created a new cluster, on 2x m1.xlarge instances (4 vCPU, 64-bit, 15GB RAM, Raid0
4x420GB HDD Disk), and changed RF to 2
2. Took the snapshot of the data (keyspace.table=logs.group) on one of the old nodes
3. scp between old node and the new node. From snapshot to the cassandra/data/kayspace/tablename
folder
4. Loaded the data without restarting the server - by nodetool refresh
5. Triggered nodetool repair

After few hours - the server went down. I've attached the logs. This is the case 5 files.
Maybe that's because of the size of sstables? In my case one of them was around 50GB.

I've also migrated the rest of the data (not "big" logs.group, but another tables from 500M
to 5GB) the same way and the server was working fine.

> OutOfMemoryError after few hours from node restart
> --------------------------------------------------
>
>                 Key: CASSANDRA-10787
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10787
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Amazon DataStax Auto-Clustering AMI 2.6.3-1404-pv
> on 2x m1.large instances (2 vCPU, 64-bit, 7.5GB RAM, Raid0 2x420GB Disk)
> [cqlsh 5.0.1 | Cassandra 2.2.3 | CQL spec 3.3.1 | Native protocol v4]
> RF=3
>            Reporter: Piotr Westfalewicz
>             Fix For: 2.2.x
>
>         Attachments: case2_debuglog_head.txt, case2_debuglog_tail.txt, case2_systemlog.txt,
case3_debuglog_tail.txt, case3_systemlog_tail.txt, case4_debuglog_tail.txt, case4_systemlog.txt,
case5_debuglog.txt, case5_systemlog.txt
>
>
> Cassandra Cluster was operating flawessly for around 3 months. Lately I've got a critical
problem with it - after few hours of running clients are disconnected permanently (that may
be Datastax C# Driver problem, though), however few more hours later (with smaller load),
on all 2 nodes there is thrown an exception (details in files):
> bq. java.lang.OutOfMemoryError: Java heap space
> Cases description:
>     Case 2 (heavy load):
>         - 2015-11-26 16:09:40,834 Restarted all nodes in cassandra cluster
> 		- 2015-11-26 17:03:46,774 First client disconnected permanently
> 		- 2015-11-26 22:17:02,327 Node shutdown
> 	Case 3 (unknown load, different node):
> 		- 2015-11-26 02:19:49,585 Node shutdown (visible only in systemlog, I don't know why
not in debug log)
> 	Case 4 (low load):
> 		- 2015-11-27 13:00:24,994 Node restart
> 		- 2015-11-27 22:26:56,131 Node shutdown
> Is that a software issue or I am using too weak Amazon instances? If so, how can the
required amount of memory be calculated?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message