cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Björn Hegerfors (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-7019) Improve tombstone compactions
Date Thu, 05 Feb 2015 20:58:40 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307963#comment-14307963
] 

Björn Hegerfors edited comment on CASSANDRA-7019 at 2/5/15 8:58 PM:
--------------------------------------------------------------------

I posted a related ticked some time ago, CASSANDRA-8359. In particular, the side note at the
end is essentially this ticket exactly, for DTCS. A solution to this ticket may or may not
solve the main issue in that ticket, but that's a matter for that ticket.

Since DTCS SSTables are (supposed to be) separated into time windows, we have the concept
of an _oldest_ SSTable in a way that we don't with STCS. To me it seems pretty clear that
a multi-SSTable tombstone compaction on _n_ SSTables should always target the _n_ oldest ones.
The oldest one alone is practically guaranteed to overlap with any other SSTable, in terms
of tokens. So picking the right SSTables for multi-tombstone compaction should be as easy
as sorting by age (min timestamp), taking the oldest one, and include the newer ones in succession,
checking at which point the tombstone ratio is the highest. Or something close to that, anyway.
Then we might as well write them back as a single SSTable, I don't see why not.

EDIT: moved the all of the below to CASSANDRA-7272, where it belongs.

-As for the STCS case, I don't understand why major compaction for STCS isn't already optimal.
I do see why one might want to compact some but not all SSTables in a multi-tombstone compaction
(though DTCS should be a better fit for anyone wanting this). But if every single SSTable
is being rewritten to disk, why not write them into one file? As far as I understand, the
ultimate goal of STCS is to be one SSTable. STCS only gets there, the natural way, once in
a blue moon. But that's the most optimal state that it can be in. Am I wrong?-

-The only explanation I can see for splitting the result of compacting all SSTables into fragments,
is if those fragments are:-
-1. Partitioned smartly. For example into separate token ranges (à la LCS), timestamp ranges
(à la DTCS) or clustering column ranges (which would be interesting). Or a combination of
these.-
-2. The structure upheld by the resulting fragments is not subsequently demolished by the
running compaction strategy going on with its usual business.-


was (Author: bj0rn):
I posted a related ticked some time ago, CASSANDRA-8359. In particular, the side note at the
end is essentially this ticket exactly, for DTCS. A solution to this ticket may or may not
solve the main issue in that ticket, but that's a matter for that ticket.

Since DTCS SSTables are (supposed to be) separated into time windows, we have the concept
of an _oldest_ SSTable in a way that we don't with STCS. To me it seems pretty clear that
a multi-SSTable tombstone compaction on _n_ SSTables should always target the _n_ oldest ones.
The oldest one alone is practically guaranteed to overlap with any other SSTable, in terms
of tokens. So picking the right SSTables for multi-tombstone compaction should be as easy
as sorting by age (min timestamp), taking the oldest one, and include the newer ones in succession,
checking at which point the tombstone ratio is the highest. Or something close to that, anyway.
Then we might as well write them back as a single SSTable, I don't see why not.

As for the STCS case, I don't understand why major compaction for STCS isn't already optimal.
I do see why one might want to compact some but not all SSTables in a multi-tombstone compaction
(though DTCS should be a better fit for anyone wanting this). But if every single SSTable
is being rewritten to disk, why not write them into one file? As far as I understand, the
ultimate goal of STCS is to be one SSTable. STCS only gets there, the natural way, once in
a blue moon. But that's the most optimal state that it can be in. Am I wrong?

The only explanation I can see for splitting the result of compacting all SSTables into fragments,
is if those fragments are:
1. Partitioned smartly. For example into separate token ranges (à la LCS), timestamp ranges
(à la DTCS) or clustering column ranges (which would be interesting). Or a combination of
these.
2. The structure upheld by the resulting fragments is not subsequently demolished by the running
compaction strategy going on with its usual business.

> Improve tombstone compactions
> -----------------------------
>
>                 Key: CASSANDRA-7019
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7019
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Assignee: Branimir Lambov
>              Labels: compaction
>             Fix For: 3.0
>
>
> When there are no other compactions to do, we trigger a single-sstable compaction if
there is more than X% droppable tombstones in the sstable.
> In this ticket we should try to include overlapping sstables in those compactions to
be able to actually drop the tombstones. Might only be doable with LCS (with STCS we would
probably end up including all sstables)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message