cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-8019) Windows Unit tests and Dtests erroring due to sstable deleting task error
Date Tue, 11 Nov 2014 19:16:34 GMT


Joshua McKenzie updated CASSANDRA-8019:
    Attachment: 8019_v3.txt

v3 attached.  Refcounting on SSTR from within SSTableScanner, updated SSTableRewriterTest
to try-with-resource CompactionControllers and Scanners.  Passes all unit tests on linux and
dtest failures match CI environment, and "Unable to delete" errors on windows unit tests on
2.1 branch are greatly reduced.  I still see some "Unable to delete" messages during runtime
while attempting to force compaction on a loaded system but those are also reduced and I'll
track them down in a separate effort.

I chose to go with refcounting rather than simply changing the ordering in CompactionTask
as we need some codification of the ordering relationship between scanners and sstables in
order to prevent this type of "error" in the future.

The SSTableScanner relies on internal data structures within the SSTR and, while the previous
code will hold the reference open and prevent GC due to the pointer it has internally as well
as the ifile and dfile references, our previous logical structure of there being no relationship
between SSTableScanners being open and SSTR deletion was misleading.  While we replicate some
of the references in the scanner so the SSTR can technically be deleted out of order and we
rely on the filesystem to keep the file open if we have a handle to it, a more clear relationship
between the components is preferable IMO.

[~jbellis]: I threw you on this as reviewer when I was leaning towards log suppression route
as it was a trivial effort; [~krummas]: would you be willing to review this as you've been
in the compaction and SSTableRewriter space recently?

> Windows Unit tests and Dtests erroring due to sstable deleting task error
> -------------------------------------------------------------------------
>                 Key: CASSANDRA-8019
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Windows 7
>            Reporter: Philip Thompson
>            Assignee: Joshua McKenzie
>              Labels: windows
>             Fix For: 2.1.3
>         Attachments: 8019_aggressive_v1.txt, 8019_conservative_v1.txt, 8019_v2.txt, 8019_v3.txt
> Currently a large number of dtests and unit tests are erroring on windows with the following
error in the node log:
> {code}
> ERROR [NonPeriodicTasks:1] 2014-09-29 11:05:04,383 - Unable
to delete c:\\users\\username\\appdata\\local\\temp\\dtest-vr6qgw\\test\\node1\\data\\system\\local-7ad54392bcdd35a684174e047860b377\\system-local-ka-4-Data.db
(it will be removed on server restart; we'll also retry after GC)\n
> {code}
> git bisect points to the following commit:
> {code}
> 0e831007760bffced8687f51b99525b650d7e193 is the first bad commit
> commit 0e831007760bffced8687f51b99525b650d7e193
> Author: Benedict Elliott Smith <>
> Date:  Fri Sep 19 18:17:19 2014 +0100
>     Fix resource leak in event of corrupt sstable
>     patch by benedict; review by yukim for CASSANDRA-7932
> :100644 100644 d3ee7d99179dce03307503a8093eb47bd0161681 f55e5d27c1c53db3485154cd16201fc5419f32df
M      CHANGES.txt
> :040000 040000 194f4c0569b6be9cc9e129c441433c5c14de7249 3c62b53b2b2bd4b212ab6005eab38f8a8e228923
M  src
> :040000 040000 64f49266e328b9fdacc516c52ef1921fe42e994f de2ca38232bee6d2a6a5e068ed9ee0fbbc5aaebe
M  test
> {code}
> You can reproduce this by running simple_bootstrap_test.

This message was sent by Atlassian JIRA

View raw message