cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-6346) Cassandra 2.0 server node runs out of memory during writes/replications
Date Wed, 13 Nov 2013 22:49:21 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis resolved CASSANDRA-6346.
---------------------------------------

    Resolution: Not A Problem

The main knob to turn to make load shedding more aggressive is to reduce rpc_write_timeout.
 (See CASSANDRA-6059)

> Cassandra 2.0 server node runs out of memory during writes/replications
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-6346
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6346
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Nitin
>         Attachments: LinkedBlockingQ.png
>
>
> Currently we are running 18 node cassandra cluster with NetworkTopologyReplication Strategy
(d1 = 3 and d2=3).  
> Our severs seem to crash with OOM exceptions. Our heap size is 8Gb. However while crashing
i got hold of the hprof file and ran it through an eclipse MAT analyzer
> After analyzing the hprof (please see attachment for top offenders), i find that there
is a linked blocking queue (from mutation stage) that seems to hold about 7.3 Gb of the total
8Gb of ram. 
> After deep diving into the cassandra2.0 code, i see that every update/write/replication
goes through stages and mutation stage  and the no of threads that flush this queue (I am
assuming memtable to sstable write) is controlled by concurrent writes. Ours is set to 32
concurrent writes
> However we observe node crashes even when there are 0 writes to the node but replication
requests are floating around the cluster. 
> Any ideas what are the knobs to throttle the size of these queues/max no of write and
replication requests a node can get? What are the recommended settings to operate cassandra
node in a mode where it rejects requests beyond certain queue threshold?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message