Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D09917521 for ; Fri, 4 Sep 2015 18:50:51 +0000 (UTC) Received: (qmail 59699 invoked by uid 500); 4 Sep 2015 18:50:46 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 59656 invoked by uid 500); 4 Sep 2015 18:50:46 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 59639 invoked by uid 99); 4 Sep 2015 18:50:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Sep 2015 18:50:46 +0000 Date: Fri, 4 Sep 2015 18:50:46 +0000 (UTC) From: "Joshua McKenzie (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Reopened] (CASSANDRA-10222) Periodically attempt to delete failed snapshot deletions on Windows MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-10222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie reopened CASSANDRA-10222: ----------------------------------------- The patch for this introduced a new error in coverity coverage on 3.0: {noformat} *** CID 1322860: API usage errors (INVALIDATE_ITERATOR) /src/java/org/apache/cassandra/io/sstable/SnapshotDeletingTask.java: 71 in org.apache.cassandra.io.sstable.SnapshotDeletingTask.rescheduleFailedTasks()() 65 66 /** 67 * Retry all failed deletions. 68 */ 69 public static void rescheduleFailedTasks() 70 { >>> CID 1322860: API usage errors (INVALIDATE_ITERATOR) >>> Attempting to obtain another element from "org.apache.cassandra.io.sstable.SnapshotDeletingTask.failedTasks" after it's been modified. 71 for (SnapshotDeletingTask task : failedTasks) 72 { 73 failedTasks.remove(task); 74 ScheduledExecutors.nonPeriodicTasks.submit(task); 75 } 76 } {noformat} This is the same logic from SSTableDeletingTask (SnapshotDeletingTask is based upon that) - we've had that code around for awhile (since July of 2011) without issue that I know of, and the [javadoc|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CopyOnWriteArraySet.html] for CopyOnWriteArraySet reads as though iterator invalidation isn't a concern: bq. Traversal via iterators is fast and cannot encounter interference from other threads. Iterators rely on unchanging snapshots of the array at the time the iterators were constructed. Either way, it's trivial to fix to reduce coverity errors if nothing else. Attaching a small patch that converts both SSTableDeletingtask and SnapshotDeletingTask's internal structure to match the new structure on 3.0 in TransactionTidier/SSTableTidier w/a ConcurrentLinkedQueue and polling on re-submit. > Periodically attempt to delete failed snapshot deletions on Windows > ------------------------------------------------------------------- > > Key: CASSANDRA-10222 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10222 > Project: Cassandra > Issue Type: Improvement > Reporter: Joshua McKenzie > Assignee: Joshua McKenzie > Labels: Windows > Fix For: 2.2.2 > > Attachments: 10222_coverity_fix.txt > > > The changes in CASSANDRA-9658 leave us in a position where a node on Windows will have to be restarted to clear out snapshots that cannot be deleted at request time due to sstables still being mapped, thus preventing deletions of hard links. A simple periodic task to categorize failed snapshot deletions and retry them would help prevent node disk utilization from growing unbounded by snapshots as compaction will eventually make these snapshot files deletable. > Given that hard links to files in NTFS don't take up any extra space on disk so long as the original file still exists, the only limitation for users from this approach will be the inability to 'move' a snapshot file to another drive share. They will be copyable, however, so it's a minor platform difference. > This goes directly against the goals of CASSANDRA-8271 and will likely be built on top of that code. Until such time as we get buffered performance in-line with memory-mapped, this is an interim necessity for production roll-outs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)