cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
Date Wed, 29 Jul 2015 05:54:05 GMT


Stefania commented on CASSANDRA-7066:

[~benedict] first basic single log file version is available on [this branch|].
I wait to hear from you regarding adding CRCs and update times.

Here is the write-up that I've added to NEWS.txt:
     New transaction log files have been introduced to replace the compactions_in_progress
     system table. They control the sstable files involved in compactions and other operations
     such as flushing and streaming. Use the sstablelister tool to list any sstable files
     currently involved in operations not yet completed, which we define as temporary files.
     A transaction log file contains one sstable per line, with the prefix "add:" or "remove:".
     They also contain a final special line "commit", only inserted when the transaction is
     On startup we use these files to cleanup any partial transactions that were in progress
     when the process exited. If the commit line is found, we keep new "add" prefix sstables
     delete the old "remove" prefix sstables, vice-versa if the commit line is missing.
     Should you loose or delete these log files, both old and new sstable files will be kept
     as live files, which will result in duplicated sstables. Should you manually edit these
     files and remove or add the commit line for example, then this would change which sstable
     files are retained on startup. See CASSANDRA-7066 for full details.

> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>                 Key: CASSANDRA-7066
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>            Priority: Minor
>              Labels: benedict-to-commit, compaction
>             Fix For: 3.0 alpha 1
>         Attachments: 7066.txt
> Currently we manage a list of in-progress compactions in a system table, which we use
to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a
bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files,
or conversely not cleanup files that have been superceded); and 2) it's only used for a regular
compaction - no other compaction types are guarded in the same way, so can result in duplication
if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and on startup
we simply delete any sstables that occur in the union of all ancestor sets. This way as soon
as we finish writing we're capable of cleaning up any leftovers, so we never get duplication.
It's also much easier to reason about.

This message was sent by Atlassian JIRA

View raw message