cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8360) In DTCS, always compact SSTables in the same time window, even if they are fewer than min_threshold
Date Wed, 26 Nov 2014 06:08:12 GMT


Jonathan Ellis commented on CASSANDRA-8360:

The max is there to make sure we don't oom or overwhelm the heap with compaction buffers.
 So that should probably be respected at all times.

I agree that "ignore min, except for the 'incoming' window" makes the most sense -- you don't
want to constantly recompact 90% of the data, every time a new sstable is flushed.  That's
a big hit to DTCS advantage in write amplification.

It's possible that as you say this is fine if the window is small enough -- but if it's that
small (smaller than flush interval) then it will be the "previously active window" soon enough.
 So I don't think it's worth trying to special case that.

> In DTCS, always compact SSTables in the same time window, even if they are fewer than
> ---------------------------------------------------------------------------------------------------
>                 Key: CASSANDRA-8360
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Björn Hegerfors
>            Priority: Minor
> DTCS uses min_threshold to decide how many time windows of the same size that need to
accumulate before merging into a larger window. The age of an SSTable is determined as its
min timestamp, and it always falls into exactly one of the time windows. If multiple SSTables
fall into the same window, DTCS considers compacting them, but if they are fewer than min_threshold,
it decides not to do it.
> When do more than 1 but fewer than min_threshold SSTables end up in the same time window
(except for the current window), you might ask? In the current state, DTCS can spill some
extra SSTables into bigger windows when the previous window wasn't fully compacted, which
happens all the time when the latest window stops being the current one. Also, repairs and
hints can put new SSTables in old windows.
> I think, and [~jjordan] agreed in a comment on CASSANDRA-6602, that DTCS should ignore
min_threshold and compact tables in the same windows regardless of how few they are. I guess
max_threshold should still be respected.
> [~jjordan] suggested that this should apply to all windows but the current window, where
all the new SSTables end up. That could make sense. I'm not clear on whether compacting many
SSTables at once is more cost efficient or not, when it comes to the very newest and smallest
SSTables. Maybe compacting as soon as 2 SSTables are seen is fine if the initial window size
is small enough? I guess the opposite could be the case too; that the very newest SSTables
should be compacted very many at a time?

This message was sent by Atlassian JIRA

View raw message