cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <>
Subject Re: Time series data with only inserts
Date Tue, 31 May 2016 04:26:22 GMT

Your compaction strategy gets triggered whenever you flush memtables to disk.

Most compaction strategies, especially those designed for write-only time-series workloads,
check for fully expired sstables (getFullyExpiredSStables()) “often” (DTCS does it every
10 minutes, because it’s fairly expensive). That’s THE most efficient way to drop expired
data - full sstable drops because it’s fully expired. Given that you’re not doing reads,
it’s likely that getFullyExpiredSStables will have few (or no) blockers, and will search
for / return fully expired sstables 7 days after they’re created, assuming you manage to
use a compaction strategy that doesn’t mix old data with new data (DTCS is the only ‘official’
one that does this now, though TWCS in #9666 may be interesting to you).

Unfortunately, life being what it is, it’s pretty easy to end up in a situation where read
repairs or other overlaps cause ‘blockers’ which prevent sstables from being fully expired.
In those situations, using the tombstone compaction sub properties can nudge things in the
right direction (for example, you can tell cassandra to compact a sstable with itself if it’s
over 24 hours old and contains more than 80% tombstones, where 24 and 80 are both variables
you control). Check out
for the tombstone related options.

- Jeff

On 5/30/16, 3:54 PM, "Rakesh Kumar" <> wrote:

>Let us assume that there is a table which gets only inserts and under
>normal circumstances no reads on it. If we assume TTL to be 7 days,
>what event
>will trigger a compaction/purge of old data if the old data is not in
>the mem cache and no session needs it.
View raw message