cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evelyn Smith <u5015...@gmail.com>
Subject Re: OOM after a while during compacting
Date Thu, 05 Apr 2018 13:26:55 GMT
Oh and second, are you attempting a major compact while you have all those pending compactions?

Try letting the cluster catch up on compactions. Having that many pending is bad.

If you have replication factor of 3 and quorum you could go node by node and disable binary,
raise concurrent compactors to 4 and unthrottle compactions by setting throughput to zero.
This can help it catch up on those compactions. Then you can deal with trying a major compaction.

Regards,
Evelyn.

> On 5 Apr 2018, at 11:14 pm, Evelyn Smith <u5015159@gmail.com> wrote:
> 
> Probably a dumb question but it’s good to clarify.
> 
> Are you compacting the whole keyspace or are you compacting tables one at a time?
> 
>> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai <zpalmai@gmail.com <mailto:zpalmai@gmail.com>>
wrote:
>> 
>> Hi!
>> 
>> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) and when
running the nodetool compact command on any of the servers I get out of memory exception after
a while.
>> 
>> - Before calling the compact first I did a repair and before that there was a bigger
update on a lot of entries so I guess a lot of sstables were created. The reapir created around
~250 pending compaction tasks, 2 of the nodes I managed to finish with upgrading to a 2xlarge
machine and twice the heap (but running the compact on them manually also killed one :/ so
this isn't an ideal solution)
>> 
>> Some more info: 
>> - Version is the newest 3.11.2 with java8u116
>> - Using LeveledCompactionStrategy (we have mostly reads)
>> - Heap size is set to 8GB
>> - Using G1GC
>> - I tried moving the memtable out of the heap. It helped but I still got an OOM last
night
>> - Concurrent compactors is set to 1 but it still happens and also tried setting throughput
between 16 and 128, no changes.
>> - Storage load is 127Gb/140Gb/151Gb/155Gb
>> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
>> - The biggest partition I found was 90Mb but that table has only 2 sstables attached
and compacts in seconds. The rest is mostly 1 line partition with a few 10KB of data.
>> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0, 0, 0,
0]
>> 
>> In the metrics it looks something like this before dying: https://ibb.co/kLhdXH <https://ibb.co/kLhdXH>
>> 
>> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH <https://ibb.co/ctkyXH>
>> 
>> The load is usually pretty low, the nodes are almost idling (avg 500 reads/sec, 30-40
writes/sec with occasional few second spikes with >100 writes) and the pending tasks is
also around 0 usually.
>> 
>> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes cause problems?
I could finish some bigger compactions where there was no index attached but I'm not sure
100% if this is the cause.
>> 
>> Thanks,
>> Zsolt
>> 
>> 
>> 
> 


Mime
View raw message