cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10222) Periodically attempt to delete failed snapshot deletions on Windows
Date Wed, 02 Sep 2015 17:56:46 GMT


Joshua McKenzie commented on CASSANDRA-10222:

The redundant run call was an artifact of me changing that interface and then not actually
reading back through it again from that perspective - bad form on my part. I've refactored
the constructor on SnapshotDeletingTask a bit and broke out the task creation to an {{addFailedSnapshot}}
method - I think it cleans that interface up quite a bit; let me know what you think on that

I'm pretty sure all compactions go through the CompactionExecutor; this change actually gets
us a tiny bit of *over* deletion attempts as {{CompactionManager.ValidationExecutor}} and
{{CompactionManager.CacheCleanupExecutor}} are both going to rely on {{CompactionExecutor.afterExecute}},
running the {{SnapshotDeletingTask.rescheduleFailedTasks}}, but I think the cost of refactoring
those classes isn't worth it just to try and eliminate rare potential no-op task removal/re-add
on a snapshot deletion that's not ready yet.

I've gone ahead and manually set up some CI jobs to run on Windows:
[2.2 utest|]
[2.2 dtest|]
[3.0 utest|]

As dtest runs on the platform are currently 10+ hours, I've limited us to 2.2 only at this
time. I can create and run a 3.0 job if you're concerned about it, however with Windows-specific
changes like this (and 3.0 being in beta) I tend to be a *little* less stringent on running
the full CI gamut than I would otherwise be.

> Periodically attempt to delete failed snapshot deletions on Windows
> -------------------------------------------------------------------
>                 Key: CASSANDRA-10222
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Joshua McKenzie
>            Assignee: Joshua McKenzie
>              Labels: Windows
>             Fix For: 2.2.2
> The changes in CASSANDRA-9658 leave us in a position where a node on Windows will have
to be restarted to clear out snapshots that cannot be deleted at request time due to sstables
still being mapped, thus preventing deletions of hard links. A simple periodic task to categorize
failed snapshot deletions and retry them would help prevent node disk utilization from growing
unbounded by snapshots as compaction will eventually make these snapshot files deletable.
> Given that hard links to files in NTFS don't take up any extra space on disk so long
as the original file still exists, the only limitation for users from this approach will be
the inability to 'move' a snapshot file to another drive share. They will be copyable, however,
so it's a minor platform difference.
> This goes directly against the goals of CASSANDRA-8271 and will likely be built on top
of that code. Until such time as we get buffered performance in-line with memory-mapped, this
is an interim necessity for production roll-outs.

This message was sent by Atlassian JIRA

View raw message