cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitalii Tymchyshyn <tiv...@gmail.com>
Subject Re: Cassandra OOM
Date Wed, 04 Jan 2012 10:38:51 GMT
Hello.

BTW: It would be great for cassandra to shutdown on Errors like OOM 
because now I am not sure if the problem described in previous email is 
the root cause or some of OOM error found in log made some "writer" stop.

I am now looking at different OOMs in my cluster. Currently each node 
has up to 300G of data in ~10 column families. Previous Heap Size of 3G 
seems to be not enough, I am raising to to 5G. Looking at heap dumps, a 
lot of memory is taken by memtables, much more than 1/3 of heap. At the 
same time, logs say that it has nothing to flush since there are not 
dirty memtables. So, what are cassandra memory requirement? Is it 1% or 
2% of disk data? Or may be I am doing something wrong?

Best regards, Vitalii Tymchyshyn

03.01.12 20:58, aaron morton написав(ла):
> The DynamicSnitch can result in less read operations been sent to a 
> node, but as long as a node is marked as UP mutations are sent to all 
> replicas. Nodes will shed load when they pull messages off the queue 
> that have expired past rpc_timeout, but they will not feed back flow 
> control to the other nodes. Other than going down or performing slow 
> enough for the dynamic snitch to route reads around them.
>
> There are also safety valves in there to reduce the size of the 
> memtables and caches in response to low memory. Perhaps that process 
> could also shed messages from thread pools with a high number of 
> pending messages.
>
> **But** going OOM with 2M+ mutations in the thread pool sounds like 
> the server was going down anyway. Did you look into why all the 
> messages were there ?
>
> Cheers
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 3/01/2012, at 11:18 PM, Віталій Тимчишин wrote:
>
>> Hello.
>>
>> We are using cassandra for some time in our project. Currently we are 
>> on 1.1 trunk (it was accidental migration, but since it's hard to 
>> migrate back and it's performing nice enough we are currently on 1.1).
>> During New Year holidays one of the servers've produces a number of 
>> OOM messages in the log.
>> According to heap dump taken, most of the memory is taken by 
>> MutationStage queue (over 2millions of items).
>> So, I am curious now if cassandra have any flow control for messages? 
>> We are using Quorum for writes and it seems to me that one slow 
>> server may start getting more messages than it can consume. The 
>> writes will still succeed performed by other servers in the 
>> replication set.
>> If there is no flow control, it should eventually get OOM. Is it the 
>> case? Are there any plans to handle this?
>> BTW: A lot of memory (~half) is taken by Inet4Address objects, so 
>> making a cache of such objects would make this problem less possible.
>>
>> -- 
>> Best regards,
>>  Vitalii Tymchyshyn
>


Mime
View raw message