Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0DA7618D1C for ; Fri, 31 Jul 2015 08:54:06 +0000 (UTC) Received: (qmail 90810 invoked by uid 500); 31 Jul 2015 08:54:05 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 90778 invoked by uid 500); 31 Jul 2015 08:54:05 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 90766 invoked by uid 99); 31 Jul 2015 08:54:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Jul 2015 08:54:05 +0000 Date: Fri, 31 Jul 2015 08:54:05 +0000 (UTC) From: "Stefania (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648950#comment-14648950 ] Stefania commented on CASSANDRA-7066: ------------------------------------- [~benedict], ready for a first round of review. * Incremental CRC checks and single log file were already available. * I've added logging of the latest update time and a checksum on update times and sizes for all files of an old descriptor. These are calculated when an sstable is obsoleted. If they do not match when we are about to delete the files, then we skip this record files. The checksum is somewhat redundant since it is difficult to change file content without changing the update time, so it can be removed if you prefer. * I've renamed {{sstablelister}} to {{sstableutil}} and added an option to cleanup any outstanding transactions ({{sstableutil -c ks table}} will perform the same tasks as we do on startup). If you really want a tool that only does this, i.e. something like {{sstablecleanup}} then again, let me know now and it can be changed easily. * I've removed the ancestors from the compression metadata. * I've also updated the dtests for sstableutil in [this commit|https://github.com/stef1927/cassandra-dtest/commit/6076cfd9c32d463ac245eed6d34e9b7921a0a7cf], I will create a pull request once we have finalized the tool semantics. > Simplify (and unify) cleanup of compaction leftovers > ---------------------------------------------------- > > Key: CASSANDRA-7066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Stefania > Priority: Minor > Labels: benedict-to-commit, compaction > Fix For: 3.0 alpha 1 > > Attachments: 7066.txt > > > Currently we manage a list of in-progress compactions in a system table, which we use to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files, or conversely not cleanup files that have been superceded); and 2) it's only used for a regular compaction - no other compaction types are guarded in the same way, so can result in duplication if we fail before deleting the replacements. > I'd like to see each sstable store in its metadata its direct ancestors, and on startup we simply delete any sstables that occur in the union of all ancestor sets. This way as soon as we finish writing we're capable of cleaning up any leftovers, so we never get duplication. It's also much easier to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)