cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
Date Tue, 21 Jul 2015 06:14:06 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634572#comment-14634572
] 

Stefania commented on CASSANDRA-7066:
-------------------------------------

[~benedict] this is ready for another round of review, see [latest commit|https://github.com/stef1927/cassandra/commit/8be68fb0707215cc2818acfd02e5f6a6452148bf].

I've split the unit tests into two classes, the tests depending on SchemaLoader have been
moved to RealTransactionsTest as they mimic real transactions such as compactions and flushing.
These tests did not change, they were just moved. TransactionLogsTest now extends AbstractTransactionalTest.
I'm not sure what you meanr by _perhaps multiple times, internally - or refactor to permit
lists of states to be returned_. Which states exactly?

I've left four comments in the code, just search for your name. They are mostly trivial except
perhaps for the untrack comment in SSTableRewriter. We need to delete aborted writers files
somewhere and I opted to keep untrack doing that except it no longer changes the log file
contents, which concerned you. If you have a different preference, like letting the writer
abort delete files as it did before just let me know. 

You will notice a couple of differences in SSTableReader related to the readMeter, these were
most likely merge errors during a rebase and the code should now be the same as on trunk.
I noticed them because of a failing unit test, IndexSummaryManagerTest.

The TransactionLogs class is now pretty well encapsulated by LifecycleTransaction, I hope
I didn't abuse this latter API too much. Only the methods dealing with deletions are still
public (e.g. waitForDeletions), they are mostly used by test code and standalone tools. If
we moved or encapsulated these methods too, the class would be pretty much package private
except for SSTableReader.InstanceTidier. Let me know if you want to go down this route. It
is just not very clear where to put InstanceTidier and the static deletion management methods,
we would be back to an SSTableDeletingTask approach.

> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>
>                 Key: CASSANDRA-7066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>            Priority: Minor
>              Labels: benedict-to-commit, compaction
>             Fix For: 3.x
>
>         Attachments: 7066.txt
>
>
> Currently we manage a list of in-progress compactions in a system table, which we use
to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a
bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files,
or conversely not cleanup files that have been superceded); and 2) it's only used for a regular
compaction - no other compaction types are guarded in the same way, so can result in duplication
if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and on startup
we simply delete any sstables that occur in the union of all ancestor sets. This way as soon
as we finish writing we're capable of cleaning up any leftovers, so we never get duplication.
It's also much easier to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message