cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
Date Fri, 31 Jul 2015 08:54:05 GMT


Stefania commented on CASSANDRA-7066:

[~benedict], ready for a first round of review.

* Incremental CRC checks and single log file were already available.

* I've added logging of the latest update time and a checksum on update times and sizes for
all files of an old descriptor. These are calculated when an sstable is obsoleted. If they
do not match when we are about to delete the files, then we skip this record files. The checksum
is somewhat redundant since it is difficult to change file content without changing the update
time, so it can be removed if you prefer.

* I've renamed {{sstablelister}} to {{sstableutil}} and added an option to cleanup any outstanding
transactions ({{sstableutil -c ks table}} will perform the same tasks as we do on startup).
If you really want a tool that only does this, i.e. something like {{sstablecleanup}} then
again, let me know now and it can be changed easily.

* I've removed the ancestors from the compression metadata.

* I've also updated the dtests for sstableutil in [this commit|],
I will create a pull request once we have finalized the tool semantics.

> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>                 Key: CASSANDRA-7066
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>            Priority: Minor
>              Labels: benedict-to-commit, compaction
>             Fix For: 3.0 alpha 1
>         Attachments: 7066.txt
> Currently we manage a list of in-progress compactions in a system table, which we use
to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a
bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files,
or conversely not cleanup files that have been superceded); and 2) it's only used for a regular
compaction - no other compaction types are guarded in the same way, so can result in duplication
if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and on startup
we simply delete any sstables that occur in the union of all ancestor sets. This way as soon
as we finish writing we're capable of cleaning up any leftovers, so we never get duplication.
It's also much easier to reason about.

This message was sent by Atlassian JIRA

View raw message