cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7331) Improve Droppable Tombstone compaction
Date Tue, 28 Oct 2014 15:24:34 GMT


Jonathan Ellis commented on CASSANDRA-7331:

Eventually we will want to perform tombstone-compaction on all the candidates (that have large
amounts of tombstones) even if they have a low value, because they just haven't been compacted
with their peers yet.

If I understand correctly, your goal here is, given a bunch of candidates to perform tombstone-compaction
on, let's order them by which is likely to clean up the most.  Right?

If that's the case, I don't think it's worth the complexity, since it's only really beneficial
if you're super behind on compaction with no hope of ever catching up.  And the right fix
there is to add more capacity or make compaction faster in some form or another.

> Improve Droppable Tombstone compaction
> --------------------------------------
>                 Key: CASSANDRA-7331
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: sankalp kohli
>            Priority: Minor
>              Labels: compaction
> I was thinking about this idea so creating a JIRA to discuss it. 
> Currently we do compaction for stables which have more than configurable number of droppable
> Also there is another JIRA CASSANDRA-7019 to do compactions involving multiple stables
from different levels which will be triggered based of same threshold. 
> One of the areas of improvement here to pick better candidates will be to find out if
a tombstone can actually get rid of data in other stables. 
> We can add a byte to tombstone to keep track of whether it has knocked off the actual
data(for which it is there) or not. 
> All tombstones will start out with 0 as its value. When it compacts with other stables
and causes data to be deleted, it will be incremented. 
> For cases where there are multiple updates and then a delete, this value can be more
than 1 depending on how many updates came in before delete. 
> If we have this, by looking at these numbers in tombstones, we can find a stable which
by compacting, we will get rid of maximum data. We can also add a global number per stable
which sums up these numbers. 
> I am not sure how this will work with range tombstones and whether this will be useful.

This message was sent by Atlassian JIRA

View raw message