The rules for tombstone eviction are as follows (regardless of your compaction strategy):

1. gc_grace must be expired, and
2. No other row fragments can exist for the row that aren't also participating in the compaction.

For LCS, there is no 'rule' that the tombstones can only be evicted at the highest level. They can be evicted on whichever of the level that the row converges on. Depending on your use case this may mean it always happens at level4, it might also mean that it most often happens at L1, or L2.






On Fri, Nov 9, 2012 at 7:31 AM, Mina Naguib <mina.naguib@adgear.com> wrote:


On 2012-11-08, at 1:12 PM, B. Todd Burruss <btoddb@gmail.com> wrote:

> we are having the problem where we have huge SSTABLEs with tombstoned data in them that is not being compacted soon enough (because size tiered compaction requires, by default, 4 like sized SSTABLEs).  this is using more disk space than we anticipated.
>
> we are very write heavy compared to reads, and we delete the data after N number of days (depends on the column family, but N is around 7 days)
>
> my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage

From my experience, levelled compaction makes space reclamation after deletes even less predictable than sized-tier.

The reason is that deletes, like all mutations, are just recorded into sstables.  They enter level0, and get slowly, over time, promoted upwards to levelN.

Depending on your *total* mutation volume VS your data set size, this may be quite a slow process.  This is made even worse if the size of the data you're deleting (say, an entire row worth several hundred kilobytes) is to-be-deleted by a small row-level tombstone.  If the row is sitting in level 4, the tombstone won't impact it until enough data has pushed over all existing data in level3, level2, level1, level0

Finally, to guard against the tombstone missing any data, the tombstone itself is not candidate for removal (I believe even after gc_grace has passed) unless it's reached the highest populated level in levelled compaction.  This means if you have 4 levels and issue a ton of deletes (even deletes that will never impact existing data), these tombstones are deadweight that cannot be purged until they hit level4.

For a write-heavy workload, I recommend you stick with sized-tier.  You have several options at your disposal (compaction min/max thresholds, gc_grace) to move things along.  If that doesn't help, I've heard of some fairly reputable people doing some fairly blasphemous things (major compactions every night).





--
Ben Coverston
DataStax -- The Apache Cassandra Company