cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Bailey (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
Date Wed, 29 Jul 2015 19:13:05 GMT


Nick Bailey commented on CASSANDRA-7066:

Thanks for the ping Jonathan. There is a lot to follow and digest here so let me just try
to bring up my concerns as someone working on OpsCenter. Those concerns should fairly well
represent any other tools trying to do backup/restore or even a user trying to do it manually.

>From what I have tried to read through, it sounds like most of the concerns here are around
cases where files/directories are manipulated manually rather than through the provided tools.
So hopefully I can safely be ignored :).

* The snapshot command should create a full backup of a keyspace/table on the node. The directories
created from the snapshot should be all that is required to restore that keyspace/table on
that node to the point in time that the snapshot was taken.
* A snapshot should be restorable either via the sstableloader tool or by manually copying
the files from the snapshot in to place (given the same schema/topology). If copying the files
into place manually, restarting the node or making an additional call to load the sstables
may be required.
* When using the sstableloader tool I should be able to restore data taken from a snapshot
regardless of what data exists on the node or is currently being written.

If we are all good on those points then I don't see any issues from my standpoint. [~jbellis]
was there anything else you wanted to me to look at specifically?

> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>                 Key: CASSANDRA-7066
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>            Priority: Minor
>              Labels: benedict-to-commit, compaction
>             Fix For: 3.0 alpha 1
>         Attachments: 7066.txt
> Currently we manage a list of in-progress compactions in a system table, which we use
to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a
bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files,
or conversely not cleanup files that have been superceded); and 2) it's only used for a regular
compaction - no other compaction types are guarded in the same way, so can result in duplication
if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and on startup
we simply delete any sstables that occur in the union of all ancestor sets. This way as soon
as we finish writing we're capable of cleaning up any leftovers, so we never get duplication.
It's also much easier to reason about.

This message was sent by Atlassian JIRA

View raw message