Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3870517863 for ; Thu, 9 Oct 2014 18:27:35 +0000 (UTC) Received: (qmail 68391 invoked by uid 500); 9 Oct 2014 18:27:35 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 68236 invoked by uid 500); 9 Oct 2014 18:27:34 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 67970 invoked by uid 99); 9 Oct 2014 18:27:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Oct 2014 18:27:34 +0000 Date: Thu, 9 Oct 2014 18:27:34 +0000 (UTC) From: "Joshua McKenzie (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8019) Windows Unit tests and Dtests erroring due to sstable deleting task error MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165484#comment-14165484 ] Joshua McKenzie commented on CASSANDRA-8019: -------------------------------------------- {quote}Compaction + drop assumes that if refcount is zero it's safe to delete.{quote} It does, however unless we can guarantee that all SSTableScanners are closed with handles to the underlying files this is an incorrect assumption (on Windows, pre 3.0) {quote}How are we getting into a situation where SSTableScanner (used by compaction) still has it open when it's deleted?{quote} Previously (before CASSANDRA-7932) we used a CloseableIterator and closed both that and the CompactionController prior to DataTracker.markCompactedSSTablesReplaced. Currently we're managing the controller and scanners via scoped-resource management within CompactionTask and calling markCompactedSSTablesReplaced before either are closed out. This marks the sstables obsolete, decrements ref count, and attempts to delete them while we still have the index and data file explicitly open in the scanners. Fixing the ordering in CompactionTask fixes the error this ticket was opened for but doesn't address all instances of these types of errors in unit tests on the 2.1 branch on Windows. I can play whac-a-mole tracking all of these down but there's nothing stopping us from re-introducing further errors of this type since there's no contract between the readers and scanners as far as references to underlying files is concerned. On 2.1+linux or trunk+either, you'll never see anything indicating that this ordering problem has occurred. > Windows Unit tests and Dtests erroring due to sstable deleting task error > ------------------------------------------------------------------------- > > Key: CASSANDRA-8019 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8019 > Project: Cassandra > Issue Type: Bug > Environment: Windows 7 > Reporter: Philip Thompson > Assignee: Joshua McKenzie > Labels: windows > Fix For: 2.1.1 > > Attachments: 8019_aggressive_v1.txt, 8019_conservative_v1.txt, 8019_v2.txt > > > Currently a large number of dtests and unit tests are erroring on windows with the following error in the node log: > {code} > ERROR [NonPeriodicTasks:1] 2014-09-29 11:05:04,383 SSTableDeletingTask.java:89 - Unable to delete c:\\users\\username\\appdata\\local\\temp\\dtest-vr6qgw\\test\\node1\\data\\system\\local-7ad54392bcdd35a684174e047860b377\\system-local-ka-4-Data.db (it will be removed on server restart; we'll also retry after GC)\n > {code} > git bisect points to the following commit: > {code} > 0e831007760bffced8687f51b99525b650d7e193 is the first bad commit > commit 0e831007760bffced8687f51b99525b650d7e193 > Author: Benedict Elliott Smith > Date: Fri Sep 19 18:17:19 2014 +0100 > Fix resource leak in event of corrupt sstable > patch by benedict; review by yukim for CASSANDRA-7932 > :100644 100644 d3ee7d99179dce03307503a8093eb47bd0161681 f55e5d27c1c53db3485154cd16201fc5419f32df M CHANGES.txt > :040000 040000 194f4c0569b6be9cc9e129c441433c5c14de7249 3c62b53b2b2bd4b212ab6005eab38f8a8e228923 M src > :040000 040000 64f49266e328b9fdacc516c52ef1921fe42e994f de2ca38232bee6d2a6a5e068ed9ee0fbbc5aaebe M test > {code} > You can reproduce this by running simple_bootstrap_test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)