cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Eriksson (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7019) Major tombstone compaction
Date Thu, 25 Sep 2014 18:11:36 GMT


Marcus Eriksson commented on CASSANDRA-7019:

The problem with starting in high levels is that it will take a long time before that data
gets included in a (minor) compaction. This is basically a major compaction (like in current

The option to not putting low tokens in lower levels is to write all levels at the same time
and randomly distribute the tokens over the levels (and put 1% in L1, 10% in L2, 89% in L3),
but i cant really see any difference compared to having the low tokens in one sstable, the
number of overlapping tokens between a newly flushed file in L0 and L1 should be the same
(if tokens are evenly distributed over the flushed sstable)

> Major tombstone compaction
> --------------------------
>                 Key: CASSANDRA-7019
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>              Labels: compaction
>             Fix For: 3.0
> It should be possible to do a "major" tombstone compaction by including all sstables,
but writing them out 1:1, meaning that if you have 10 sstables before, you will have 10 sstables
after the compaction with the same data, minus all the expired tombstones.
> We could do this in two ways:
> # a nodetool command that includes _all_ sstables
> # once we detect that an sstable has more than x% (20%?) expired tombstones, we start
one of these compactions, and include all overlapping sstables that contain older data.

This message was sent by Atlassian JIRA

View raw message