cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8019) Windows Unit tests and Dtests erroring due to sstable deleting task error
Date Thu, 09 Oct 2014 18:27:34 GMT


Joshua McKenzie commented on CASSANDRA-8019:

{quote}Compaction + drop assumes that if refcount is zero it's safe to delete.{quote}
It does, however unless we can guarantee that all SSTableScanners are closed with handles
to the underlying files this is an incorrect assumption (on Windows, pre 3.0)
{quote}How are we getting into a situation where SSTableScanner (used by compaction) still
has it open when it's deleted?{quote}
Previously (before CASSANDRA-7932) we used a CloseableIterator and closed both that and the
CompactionController prior to DataTracker.markCompactedSSTablesReplaced. Currently we're managing
the controller and scanners via scoped-resource management within CompactionTask and calling
markCompactedSSTablesReplaced before either are closed out.  This marks the sstables obsolete,
decrements ref count, and attempts to delete them while we still have the index and data file
explicitly open in the scanners.

Fixing the ordering in CompactionTask fixes the error this ticket was opened for but doesn't
address all instances of these types of errors in unit tests on the 2.1 branch on Windows.
 I can play whac-a-mole tracking all of these down but there's nothing stopping us from re-introducing
further errors of this type since there's no contract between the readers and scanners as
far as references to underlying files is concerned.  On 2.1+linux or trunk+either, you'll
never see anything indicating that this ordering problem has occurred.

> Windows Unit tests and Dtests erroring due to sstable deleting task error
> -------------------------------------------------------------------------
>                 Key: CASSANDRA-8019
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Windows 7
>            Reporter: Philip Thompson
>            Assignee: Joshua McKenzie
>              Labels: windows
>             Fix For: 2.1.1
>         Attachments: 8019_aggressive_v1.txt, 8019_conservative_v1.txt, 8019_v2.txt
> Currently a large number of dtests and unit tests are erroring on windows with the following
error in the node log:
> {code}
> ERROR [NonPeriodicTasks:1] 2014-09-29 11:05:04,383 - Unable
to delete c:\\users\\username\\appdata\\local\\temp\\dtest-vr6qgw\\test\\node1\\data\\system\\local-7ad54392bcdd35a684174e047860b377\\system-local-ka-4-Data.db
(it will be removed on server restart; we'll also retry after GC)\n
> {code}
> git bisect points to the following commit:
> {code}
> 0e831007760bffced8687f51b99525b650d7e193 is the first bad commit
> commit 0e831007760bffced8687f51b99525b650d7e193
> Author: Benedict Elliott Smith <>
> Date:  Fri Sep 19 18:17:19 2014 +0100
>     Fix resource leak in event of corrupt sstable
>     patch by benedict; review by yukim for CASSANDRA-7932
> :100644 100644 d3ee7d99179dce03307503a8093eb47bd0161681 f55e5d27c1c53db3485154cd16201fc5419f32df
M      CHANGES.txt
> :040000 040000 194f4c0569b6be9cc9e129c441433c5c14de7249 3c62b53b2b2bd4b212ab6005eab38f8a8e228923
M  src
> :040000 040000 64f49266e328b9fdacc516c52ef1921fe42e994f de2ca38232bee6d2a6a5e068ed9ee0fbbc5aaebe
M  test
> {code}
> You can reproduce this by running simple_bootstrap_test.

This message was sent by Atlassian JIRA

View raw message