cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
Date Mon, 04 May 2015 10:38:07 GMT


Benedict commented on CASSANDRA-7066:

As far as level of integration goes, this looks fine for a base on 8984. CASSANDRA-8568 will
most likely merge this work with its new "db.lifecycle" package, either storing the OperationLog
inside of its Transaction object, or merging the functionality directly, since the behaviours
are quite interwoven. Since 8568 has not been review yet, I will merge the functionality after
this patch gets committed, in case that has to be revised significantly, although perhaps
you can chime in on review of that ticket as well.

A couple of things I've noticed (not doing a formal review just yet):
* {{OperationLog}} should probably implement Transactional
* {{OperationLog.cleanup()}} (or {{OperationLog.LogFile.delete()}}) should force a sync of
the directory's file descriptor, to make sure there is a happens-before relationship between
the two log-file deletions, else if the process crashes we may be left with the wrong one
deleted and corrupt the sstable state
* It looks to me like the SSTableDeleteNotification approach is currently broken: at least
a set of sstables we care about needs to be maintained to be checked against, however I would
probably consider adding a Runnable to each replaced {{SSTableReader}} using {{runOnClose}},
containing a {{Ref}} to be released which, once all are released, removes the log file. This
way it's tracked, debuggable, and safe against miscounting.
* SSTableReader.DescriptorTypeTidy and SSTableReader.GlobalTidy should be merged
* getTemporaryFiles should probably return a Set, for efficient testing in the list methods

> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>                 Key: CASSANDRA-7066
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>            Priority: Minor
>              Labels: compaction
>             Fix For: 3.x
> Currently we manage a list of in-progress compactions in a system table, which we use
to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a
bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files,
or conversely not cleanup files that have been superceded); and 2) it's only used for a regular
compaction - no other compaction types are guarded in the same way, so can result in duplication
if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and on startup
we simply delete any sstables that occur in the union of all ancestor sets. This way as soon
as we finish writing we're capable of cleaning up any leftovers, so we never get duplication.
It's also much easier to reason about.

This message was sent by Atlassian JIRA

View raw message