cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
Date Mon, 03 Aug 2015 00:24:05 GMT


Stefania commented on CASSANDRA-7066:

Thanks Benedict. Code updated as follows:

bq. The last modified sort order was necessary for the same reason a full checksum of all
of the old files is too heavy handed: restarting in the middle of deleting the files. We want
to finish deleting them, and if we take the max timestamp we can say that will safely stay
the same. We could keep the full checksum and include a count; if the count on disk is lower,
and our max timestamp matches, we can proceed; if the count is the same we require a checksum
match. This may or may not be overkill, but there's on harm in paranoia here.

I totally missed the possibility of crashing after having deleted files only partially, now
it makes sense. I've removed the checksum as it was a bit redundant but I added the count
since it helps with the last point below.

bq. It looks like serialization of the sstables now assumes the ancestors won't be present,
but this will break upgrade. We need to check the sstable version, and at least skip the ancestors
if they're present

I added a method to {{Version}} called {{hasCompactionAncestors}} returning true for older
versions, and modified deserialize accordingly. Will this cause problems if the alpha goes
out before this ticket is committed?

bq. The compaction metadata equality is probably not sensibly employed anywhere; we should
just confirm this and remove it IMO since it makes very little sense, especially now that
the ancestors have been removed

It's true it doesn't seem to be used except for a unit test. However all classes implementing
MetadataComponent implement them so I left them with a
comment. Also, without it it seems a scrub test was failing (unsure if related though).

bq. I think you took my comments about panic warnings a little too literally. We should just
use strong language to make it clear to the operator we've detected disk corruption of some
kind, and have taken the most pessimistic course of action

I did my best to update the messages and make them clear, feel free to replace them with better
ones during the review.

bq. This warning should be less severe if only the last line is incomplete (and all other
lines are parsed correctly), as this would be expected if we crashed part way through serialization
bq. We should perhaps not worry at all, and continue if this last situation occurs, in the
event that all files logged are still present on the file system, and our other metadata all
matches for them. In this case we can be very confident we just shutdown in the middle of
writing a record, and can just clean up all of the new files. This should make us robust both
to the scary situation, but also minimise the pessimism, hopefully giving us the best of both

If it's the last line that is corrupt now we check all previous OLD records, and if the count
and max update time still match we only log a less severe error
and carry on.

> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>                 Key: CASSANDRA-7066
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>            Priority: Minor
>              Labels: benedict-to-commit, compaction
>             Fix For: 3.0 alpha 1
>         Attachments: 7066.txt
> Currently we manage a list of in-progress compactions in a system table, which we use
to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a
bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files,
or conversely not cleanup files that have been superceded); and 2) it's only used for a regular
compaction - no other compaction types are guarded in the same way, so can result in duplication
if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and on startup
we simply delete any sstables that occur in the union of all ancestor sets. This way as soon
as we finish writing we're capable of cleaning up any leftovers, so we never get duplication.
It's also much easier to reason about.

This message was sent by Atlassian JIRA

View raw message