cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Björn Hegerfors (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8340) Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions
Date Thu, 20 Nov 2014 13:17:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219354#comment-14219354
] 

Björn Hegerfors edited comment on CASSANDRA-8340 at 11/20/14 1:16 PM:
----------------------------------------------------------------------

No drawback, really. It doesn't make a big difference. Whatever is easiest to reason about
would be best. It's true that in your repair example, it would have some effect, but only
when the repair SSTables are not older than max_sstable_age_days while the big one is. I would
imagine that repair would be equally likely to bring in a bunch of files that are older than
max_sstable_age_days, which will stay scattered (uncompacted) anyway.

I suppose using min timestamp would align more with that the rest of the strategy uses to
determine age. In fact, something that would work even more consistently with the strategy
would be to specify maximum window size. Perhaps in terms of initial window size. We have
* up to min_threshold windows of size 1, followed by
* up to min_threshold windows of size min_threshold, followed by
* up to min_threshold windows of size min_threshold^2, followed by
* up to min_threshold windows of size min_threshold^3, followed by
* etc.

And then we can simply stop generating more windows after some point. The simplest, yet perhaps
least intuitive, option would be "max_window_exponent". If we set max_window_exponent=n, then
we would stop after windows of size min_threshold^n. Example: max_window_exponent=3, min_threshold=4.
The last few windows would be 64*base_time_seconds in size, no 256 window is every created.
Other option alternatives are "max_window" or "max_window_seconds".

WDYT [~krummas]?


was (Author: bj0rn):
No drawback, really. It doesn't make a big difference. Whatever is easiest to reason about
would be best. It's true that in your repair example, it would have some effect, but only
when the repair SSTables are not older than max_sstable_age_days while the big one is. I would
imagine that repair would be likely to bring in a bunch of files that are older than max_sstable_age_days,
which will stay scattered anyway.

I suppose using min timestamp would align more with that the rest of the strategy uses to
determine age. In fact, something that would work even more consistently with the strategy
would be to specify maximum window size. Perhaps in terms of initial window size. We have
* up to min_threshold windows of size 1, followed by
* up to min_threshold windows of size min_threshold, followed by
* up to min_threshold windows of size min_threshold^2, followed by
* up to min_threshold windows of size min_threshold^3, followed by
* etc.

And then we can simply stop generating more windows after some point. The simplest, yet perhaps
least intuitive, option would be "max_window_exponent". If we set max_window_exponent=n, then
we would stop after windows of size min_threshold^n. Example: max_window_exponent=3, min_threshold=4.
The last few windows would be 64*base_time_seconds in size, no 256 window is every created.
Other option alternatives are "max_window" or "max_window_seconds".

WDYT [~krummas]?

> Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8340
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8340
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Priority: Minor
>
> Currently we check how old the newest data (max timestamp) in an sstable is when we check
if it should be compacted.
> If we instead switch to using min timestamp for this we have a pretty clean migration
path from STCS/LCS to DTCS. 
> My thinking is that before migrating, the user does a major compaction, which creates
a huge sstable containing all data, with min timestamp very far back in time, then switching
to DTCS, we will have a big sstable that we never compact (ie, min timestamp of this big sstable
is before max_sstable_age_days), and all newer data will be after that, and that new data
will be properly compacted
> WDYT [~Bj0rn] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message