cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Cassandra OOM
Date Tue, 03 Jan 2012 18:58:01 GMT
The DynamicSnitch can result in less read operations been sent to a node, but as long as a
node is marked as UP mutations are sent to all replicas. Nodes will shed load when they pull
messages off the queue that have expired past rpc_timeout, but they will not feed back flow
control to the other nodes. Other than going down or performing slow enough for the dynamic
snitch to route reads around them.

There are also safety valves in there to reduce the size of the memtables and caches in response
to low memory. Perhaps that process could also shed messages from thread pools with a high
number of pending messages. 

**But** going OOM with 2M+ mutations in the thread pool sounds like the server was going down
anyway. Did you look into why all the messages were there ? 

Cheers
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 3/01/2012, at 11:18 PM, Віталій Тимчишин wrote:

> Hello.
> 
> We are using cassandra for some time in our project. Currently we are on 1.1 trunk (it
was accidental migration, but since it's hard to migrate back and it's performing nice enough
we are currently on 1.1).
> During New Year holidays one of the servers've produces a number of OOM messages in the
log.
> According to heap dump taken, most of the memory is taken by MutationStage queue (over
2millions of items).
> So, I am curious now if cassandra have any flow control for messages? We are using Quorum
for writes and it seems to me that one slow server may start getting more messages than it
can consume. The writes will still succeed performed by other servers in the replication set.
> If there is no flow control, it should eventually get OOM. Is it the case? Are there
any plans to handle this?
> BTW: A lot of memory (~half) is taken by Inet4Address objects, so making a cache of such
objects would make this problem less possible. 
> 
> -- 
> Best regards,
>  Vitalii Tymchyshyn


Mime
View raw message