cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walsh, Stephen" <>
Subject RE: SSTables are not getting removed
Date Mon, 02 Nov 2015 14:21:24 GMT
Thanks to both Nate and Jeff, for both the bug highlighting and the configure issues.

We've upgraded to 2.1.11
Lowered our memtable_cleanup_threshold to .11
Lowered out thrift_framed_transport_size_in_mb to 15

We kicked off another run.

The results was that the cassandra failed after 1 hour.
SSTables grew to about 8 before we lost JMX connection.
(so that's about 32000 SSTables in total over all nodes)
Major GC happened every 3 min - 5 min

We then reset for a direct comparison between 2.1.6 & 2.1.11.

There was no difference in the output of 2.1.6 to 2.1.11

From: Nate McCall []
Sent: 30 October 2015 22:06
To: Cassandra Users <>
Subject: Re: SSTables are not getting removed

memtable_offheap_space_in_mb: 4096
memtable_cleanup_threshold: 0.99

^ What led to this setting? You are basically telling Cassandra to not flush the highest-traffic
memtable until the memtable space is 99% full. With that many tables and keyspaces, you are
basically locking up everything on the flush queue, causing substantial back pressure. If
you run 'nodetool tpstats' you will probably see a massive number of 'All Time Blocked' for
FlushWriter and 'Dropped' for Mutations.

Actually, this is probably why you are seeing a lot of small tables: commit log segments are
being filled and blocked from flushing due to the above, so they have to attempt to flush
repeatedly with whatever is there whenever they get the chance.

thrift_framed_transport_size_in_mb: 150

^ This is also a super bad idea. Thrift buffers grow as needed to accomodate larger results,
but they dont ever shrink. This will lead to a bunch of open connections holding onto large,
empty byte arrays. This will show up immediately in a heap dump inspection.

concurrent_compactors: 4
compaction_throughput_mb_per_sec: 0
endpoint_snitch: GossipingPropertyFileSnitch

This grinds our system to a halt and causes a major GC nearly every second.

So far the only way to get around this is to run a cron job every hour that does a "nodetool

What's the output of 'nodetool compactionstats'? CASSANDRA-9882 and CASSANDRA-9592 could be
to blame (both fixed in recent versions) or this could just be a side effect of the memory
pressure from the above settings.

Start back at the default settings (except snitch - GPFS is always a good place to start)
and change settings serially and in small increments based on feedback gleaned from monitoring

Nate McCall
Austin, TX

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain
information that is confidential. If you have received this message in error, please do not
read, copy or forward this message. Please notify the sender immediately, delete it from your
system and destroy any copies. You may not further disclose or distribute this email or its

View raw message