cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10682) Fix timeouts in BeforeFirstTest
Date Wed, 11 Nov 2015 08:10:11 GMT


Stefania commented on CASSANDRA-10682:

The timeout is caused by {{waitOnFuture}} in this method of {{SchemaKeyspace}} which is flushing
schema tables:

    static void flush()
        if (!Boolean.getBoolean("cassandra.unsafesystem"))
            ALL.forEach(table -> FBUtilities.waitOnFuture(getSchemaCFS(table).forceFlush()));

The flush writer thread terminates due to an uncaught exception, it appears the table folder
for _keyspaces_ does not exist (errno 2):

WARN  [MemtableFlushWriter:1] 2015-11-06 22:57:41,299 open(build/test/cassandra/data:110/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6,
O_RDONLY) failed, errno (2).
ERROR [MemtableFlushWriter:1] 2015-11-06 22:57:41,374 Fatal exception in thread Thread[MemtableFlushWriter:1,5,main]
java.lang.RuntimeException: java.nio.file.NoSuchFileException: build/test/cassandra/data:110/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/ma_txn_flush_c88c3670-84d9
        at ~[main/:na]
        at ~[main/:na]
        at org.apache.cassandra.db.lifecycle.LogReplica.append( ~[main/:na]
        at org.apache.cassandra.db.lifecycle.LogReplicaSet.lambda$null$151(
        at org.apache.cassandra.db.lifecycle.LogReplicaSet$$Lambda$77/463946743.perform(Unknown
Source) ~[na:na]
        at org.apache.cassandra.utils.Throwables.perform( ~[main/:na]
        at org.apache.cassandra.utils.Throwables.perform( ~[main/:na]
        at org.apache.cassandra.db.lifecycle.LogReplicaSet.append(
        at org.apache.cassandra.db.lifecycle.LogFile.addRecord( ~[main/:na]
        at org.apache.cassandra.db.lifecycle.LogFile.abort( ~[main/:na]
        at org.apache.cassandra.db.lifecycle.LogTransaction.doAbort(
        at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(
        at org.apache.cassandra.db.lifecycle.LifecycleTransaction.doAbort(
        at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(
        at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(
        at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.close(
        at org.apache.cassandra.db.Memtable$FlushRunnable.createFlushWriter(
        at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(
        at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow( ~[main/:na]
        at ~[main/:na]
        at org.apache.cassandra.db.ColumnFamilyStore$
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$
        at ~[na:1.8.0_45]

CASSANDRA-10538 will fix the exception in {{LogTransaction.doAbort()}}. However we then have
another uncaught exception in {{Transactional.close()}}, which I propose to address. Further,
we then throw the original exception that caused the transaction to be aborted. {{FlushRunnable}}
does not seem to attempt to handle exceptions, which means the post flush executor latch is
not decreased and the main thread hangs waiting on the futures. I am not sure if this patch
should try to handle exceptions in {{FlushRunnable}} so that the post flush executor then
runs, I guess the memtable should not be reclaimed either in this case. cc [~aweisberg] and

Another interesting thing to observe is that there is a commit log replayed in the test log,
perhaps we are picking up commit logs of unrelated tests? 

As for the table folder not existing, because it is created when the CFS is opened, I am guessing
we need to sync the parent folder. I'm going to run the unit tests on Jenkins a few more times
to see if the root cause is solved by sync-ing the parent folder whenever we create a table

> Fix timeouts in BeforeFirstTest
> -------------------------------
>                 Key: CASSANDRA-10682
>                 URL:
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.1
>         Attachments: TEST-org.apache.cassandra.db.SinglePartitionSliceCommandTest.log
> Some unit tests fail with a timeout in {{BeforeFirstTest}}, see for example [here|].

> In the corresponding log file, attached, there is a {{NoSuchFileException}} which might
be the cause.

This message was sent by Atlassian JIRA

View raw message