Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 866FC188C5 for ; Wed, 29 Jul 2015 19:13:05 +0000 (UTC) Received: (qmail 14478 invoked by uid 500); 29 Jul 2015 19:13:05 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 14442 invoked by uid 500); 29 Jul 2015 19:13:05 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 14428 invoked by uid 99); 29 Jul 2015 19:13:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jul 2015 19:13:05 +0000 Date: Wed, 29 Jul 2015 19:13:05 +0000 (UTC) From: "Nick Bailey (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646618#comment-14646618 ] Nick Bailey commented on CASSANDRA-7066: ---------------------------------------- Thanks for the ping Jonathan. There is a lot to follow and digest here so let me just try to bring up my concerns as someone working on OpsCenter. Those concerns should fairly well represent any other tools trying to do backup/restore or even a user trying to do it manually. >From what I have tried to read through, it sounds like most of the concerns here are around cases where files/directories are manipulated manually rather than through the provided tools. So hopefully I can safely be ignored :). * The snapshot command should create a full backup of a keyspace/table on the node. The directories created from the snapshot should be all that is required to restore that keyspace/table on that node to the point in time that the snapshot was taken. * A snapshot should be restorable either via the sstableloader tool or by manually copying the files from the snapshot in to place (given the same schema/topology). If copying the files into place manually, restarting the node or making an additional call to load the sstables may be required. * When using the sstableloader tool I should be able to restore data taken from a snapshot regardless of what data exists on the node or is currently being written. If we are all good on those points then I don't see any issues from my standpoint. [~jbellis] was there anything else you wanted to me to look at specifically? > Simplify (and unify) cleanup of compaction leftovers > ---------------------------------------------------- > > Key: CASSANDRA-7066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Stefania > Priority: Minor > Labels: benedict-to-commit, compaction > Fix For: 3.0 alpha 1 > > Attachments: 7066.txt > > > Currently we manage a list of in-progress compactions in a system table, which we use to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files, or conversely not cleanup files that have been superceded); and 2) it's only used for a regular compaction - no other compaction types are guarded in the same way, so can result in duplication if we fail before deleting the replacements. > I'd like to see each sstable store in its metadata its direct ancestors, and on startup we simply delete any sstables that occur in the union of all ancestor sets. This way as soon as we finish writing we're capable of cleaning up any leftovers, so we never get duplication. It's also much easier to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)