cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10109) Windows dtest 3.0: ttl_test.py failures
Date Tue, 18 Aug 2015 00:02:45 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700480#comment-14700480
] 

Stefania commented on CASSANDRA-10109:
--------------------------------------

This is bad news. As far as I understand the documentation, this means that on Windows we
cannot list files in a directory atomically, third paragraph [here|http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#newDirectoryStream(java.nio.file.Path)].

So we could list some sstable temporary files but not the txn log file, later they get deleted
along with their txn log file by a racing thread, and if we fail to list the txn log file
we classify these sstable files incorrectly as final files. However, these files shouldn't
exist any longer since the txn log is deleted last, so this would result in NoSuchFileExceptions
when trying to read the files.

I think we should check that all final files exist before returning them and repeat the process
in case some files no longer exist. This should only be done when we don't have atomic listing.

[~benedict] do you think this would be enough or do you see other potential races?


> Windows dtest 3.0: ttl_test.py failures
> ---------------------------------------
>
>                 Key: CASSANDRA-10109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10109
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Joshua McKenzie
>              Labels: Windows
>             Fix For: 3.0.x
>
>
> ttl_test.py:TestTTL.update_column_ttl_with_default_ttl_test2
> ttl_test.py:TestTTL.update_multiple_columns_ttl_test
> ttl_test.py:TestTTL.update_single_column_ttl_test
> Errors locally are different than CI from yesterday. Yesterday on CI we have timeouts
and general node hangs. Today on all 3 tests when run locally I see:
> {noformat}
> Traceback (most recent call last):
>   File "c:\src\cassandra-dtest\dtest.py", line 532, in tearDown
>     raise AssertionError('Unexpected error in %s node log: %s' % (node.name, errors))
> AssertionError: Unexpected error in node1 node log: ['ERROR [main] 2015-08-17 16:53:43,120
NoSpamLogger.java:97 - This platform does not support atomic directory streams (SecureDirectoryStream);
race conditions when loading sstable files could occurr']
> {noformat}
> This traces back to the commit for CASSANDRA-7066 today by [~Stefania] and [~benedict].
 Stefania - care to take this ticket and also look further into whether or not we're going
to have issues with 7066 on Windows? That error message certainly *sounds* like it's not a
good thing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message