cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <>
Subject Re: Choosing a compaction strategy (TWCS)
Date Sat, 17 Dec 2016 04:07:00 GMT
Tombstone compaction subproperties can handle tombstone removal for you (you’ll set a ratio
of tombstones worth compacting away – for example, 80%, and set an interval to prevent continuous
compaction – for example, 24 hours, and then anytime there’s no other work to do, if there’s
an sstable over 24 hours old that’s at least 80% tombstones, it’ll compact it in a single
sstable compaction).


-          Jeff


From: Voytek Jarnot <>
Reply-To: "" <>
Date: Friday, December 16, 2016 at 7:34 PM
To: "" <>
Subject: Re: Choosing a compaction strategy (TWCS)


Thanks again, Jeff. 


Thinking about this some more, I'm wondering if I'm overthinking or if there's a potential


If my compaction_window_size is 7 (DAYS), and I've got TTLs of 7 days on some (relatively
small percentage) of my records - am I going to be leaving tombstones around all over the
place?  My noob-read on this is that TWCS will not compact tables comprised of records older
than 7 days (,
but Cassandra will not evict my tombstones until 7 days + consideration for gc_grace_seconds
have passed ... resulting in no tombstone removal (?).




On Fri, Dec 16, 2016 at 1:17 PM, Jeff Jirsa <> wrote:

The issue is that your partitions will likely be in 2 sstables instead of “theoretically”
1. In practice, they’re probably going to bleed into 2 anyway (memTable flush to sstable
isn’t going to happen exactly when the window expires, so it’ll bleed a bit anyway), so
I bet no meaningful impact.


-          Jeff


From: Voytek Jarnot <>
Reply-To: "" <>
Date: Friday, December 16, 2016 at 11:12 AM

To: "" <>
Subject: Re: Choosing a compaction strategy (TWCS)


Thank you Jeff - always nice to hear straight from the source. 


Any issues you can see with 3 (my calendar-week bucket not aligning with the arbitrary 7-day
window)? Or am I confused (I'd put money on this option, but I've been wrong once or twice


On Fri, Dec 16, 2016 at 12:50 PM, Jeff Jirsa <> wrote:

I skipped over the more important question  - loading data in. Two options:

1)       Load data in order through the normal writepath and use “USING TIMESTAMP” to
set the timestamp, or

2)       Use CQLSSTableWriter and “USING TIMESTAMP” to create sstables, then sstableloader
them into the cluster.


Either way, try not to mix writes of old data and new data in the “normal” write path
 at the same time, even if you write “USING TIMESTAMP”, because it’ll get mixed in the
memTable, and flushed into the same sstable – it won’t kill you, but if you can avoid
it, avoid it.


-                      Jeff



From: Jeff Jirsa <>
Date: Friday, December 16, 2016 at 10:47 AM
To: "" <>
Subject: Re: Choosing a compaction strategy (TWCS)


With a 10 year retention, just ignore the target sstable count (I should remove that guidance,
to be honest), and go for a 1 week window to match your partition size. 520 sstables on disk
isn’t going to hurt you as long as you’re not reading from all of them, and with a partition-per-week
the bloom filter is going to make things nice and easy for you.


-          Jeff



From: Voytek Jarnot <>
Reply-To: "" <>
Date: Friday, December 16, 2016 at 10:37 AM
To: "" <>
Subject: Choosing a compaction strategy (TWCS)



Converting an Oracle table to Cassandra, one Oracle table to 4 Cassandra tables, basically
time-series - think log or auditing.  Retention is 10 years, but greater than 95% of reads
will occur on data written within the last year. 7 day TTL used on a small percentage of the
records, majority do not use TTL. Other than the aforementioned TTL, and the 10-year purge,
no updates or deletes are done.


Seems like TWCS is the right choice, but I have a few questions/concerns:


1) I'll be bulk loading a few years of existing data upon deployment - any issues with that?
 I assume using "with timestamp" when inserting this data will be mandatory if I choose TWCS?


2) I read here ( that "You should target fewer than 50
buckets per table based on your TTL." That's going to be a tough goal with a 10 year retention
... can anyone speak to how important this target really is?


3) If I'm bucketing my data with week/year (i.e., partition on year, week - so today would
be in 2016, 50), it seems like a natural fit for compaction_window_size would be 7 days, but
I'm thinking my calendar-based weeks will never align with TWCS 7-day-period weeks anyway
- am I missing something there?


I'd appreciate any other thoughts on compaction and/or twcs.





View raw message