cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Podkowinski (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-11684) Cleanup key ranges during compaction
Date Fri, 29 Apr 2016 11:58:12 GMT
Stefan Podkowinski created CASSANDRA-11684:
----------------------------------------------

             Summary: Cleanup key ranges during compaction
                 Key: CASSANDRA-11684
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11684
             Project: Cassandra
          Issue Type: Improvement
          Components: Compaction
            Reporter: Stefan Podkowinski
            Assignee: Stefan Podkowinski


Currently cleanup is considered an optional, manual operation that users are told to run to
free disk space after a node was affected by topology changes. However, unmanaged key ranges
could also end up on a node through other ways, e.g. manual added sstable files by an admin
or over streaming during repairs. 

I'm also not sure unmanaged data is really that harmless and cleanup should really be optional,
if you don't need to reclaim the disk space. When it comes to repairs, users are expected
to purge a node after downtime in case it was not fully covered by a repair within gc_grace
afterwards, in order to avoid re-introducing deleted data. But the same could happen with
unmanaged data, e.g. after topology changes activate unmanaged ranges again or after restoring
backups.

I'd therefor suggest to avoid rewriting key ranges no longer belonging to a node and older
than gc_grace during compactions. 

Maybe we could also introduce another CLEANUP_COMPACTION operation to find candidates based
on SSTable.first/last in case we don't have pending regular or tombstone compactions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message