cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-8019) Windows Unit tests and Dtests erroring due to sstable deleting task error
Date Mon, 06 Oct 2014 20:45:35 GMT


Joshua McKenzie updated CASSANDRA-8019:
    Attachment: 8019_v2.txt

After chewing on this a bit, I've come to the conclusion that the problem here isn't really
the order of deletion or even the pre-3.0 behavior as those files are *eventually* successfully
deleted on a subsequent GC.  Our problem is that we're logging this as an error immediately
on 1st failure on Windows when we expect there to be some contention on ordering pre-CASSANDRA-4050
and it's not really an error condition.

Having said that, we want to still log on legitimate error conditions so suppressing or dropping
to WARN wouldn't be appropriate in those cases.

I've attached a v2 patch against 2.0 that adds a retryCount to our SSTableDeletingTask that
will print the error message after 3 failed deletion attempts and reset the counter, only
if on Windows.  Behavior on Linux remains at 1 failed deletion == logged.  v2 quiets all deletion
errors in unit tests on 2.0 and 2.1 but should leave room for genuine locked / undeletable
files to log after a few failures.  I should note: 3 is a completely arbitrary number, and
relying on GC for eventual file deletion is of course not ideal.

Thoughts [~jbellis]?  I'd prefer we nip this in the bud as this 'Unable to delete' error is
getting more prevalent on the 2.1 branch as we make further changes and optimizations, and
I'm more comfortable loosening up the logging criteria for this error than retrofitting more
reference counting or making changes to scanner close ordering throughout the code-base.

> Windows Unit tests and Dtests erroring due to sstable deleting task error
> -------------------------------------------------------------------------
>                 Key: CASSANDRA-8019
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Windows 7
>            Reporter: Philip Thompson
>            Assignee: Joshua McKenzie
>              Labels: windows
>             Fix For: 2.1.1
>         Attachments: 8019_aggressive_v1.txt, 8019_conservative_v1.txt, 8019_v2.txt
> Currently a large number of dtests and unit tests are erroring on windows with the following
error in the node log:
> {code}
> ERROR [NonPeriodicTasks:1] 2014-09-29 11:05:04,383 - Unable
to delete c:\\users\\username\\appdata\\local\\temp\\dtest-vr6qgw\\test\\node1\\data\\system\\local-7ad54392bcdd35a684174e047860b377\\system-local-ka-4-Data.db
(it will be removed on server restart; we'll also retry after GC)\n
> {code}
> git bisect points to the following commit:
> {code}
> 0e831007760bffced8687f51b99525b650d7e193 is the first bad commit
> commit 0e831007760bffced8687f51b99525b650d7e193
> Author: Benedict Elliott Smith <>
> Date:  Fri Sep 19 18:17:19 2014 +0100
>     Fix resource leak in event of corrupt sstable
>     patch by benedict; review by yukim for CASSANDRA-7932
> :100644 100644 d3ee7d99179dce03307503a8093eb47bd0161681 f55e5d27c1c53db3485154cd16201fc5419f32df
M      CHANGES.txt
> :040000 040000 194f4c0569b6be9cc9e129c441433c5c14de7249 3c62b53b2b2bd4b212ab6005eab38f8a8e228923
M  src
> :040000 040000 64f49266e328b9fdacc516c52ef1921fe42e994f de2ca38232bee6d2a6a5e068ed9ee0fbbc5aaebe
M  test
> {code}
> You can reproduce this by running simple_bootstrap_test.

This message was sent by Atlassian JIRA

View raw message