cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-3520) Unit test are hanging on 0.8 branch
Date Mon, 28 Nov 2011 11:14:39 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-3520:
----------------------------------------

    Attachment: 3520.patch

So, the whole problem is due to our handling of non durable writes in the shutdown hook. For
those, we flush the CFS as part of shutdown. However, flush tries to grab a commitlog context,
which blocks because the commit log has been shutdown *before* all this (and for some reason,
executor.submit() don't throw any exception if the executor is shutdown).

The reason why r1185960 was triggering this is that it actually fixed a bug by which previously
to this commit, adding a new column family to a keyspace would reset the durableWrites option
to true, hence hiding the bug as far as CliTest is concerned.

One simple solution is to move the commit log shutdown after the flushes of the non-durable
CFs (which 1.0 does, and that's why it isn't affected). Truth is, it doesn't feel like the
right fix in that non-durable CF shouldn't query the commit log at all, even during flushes.
However, changing that introduces the possibility to have some CL segment retained forever
when upgrading a keyspace from non-durable to durable if we're not careful. So overall just
pushing the CL shutdown down in the shutdown hook to match 1.0 seems good enough, at least
for 0.8. Attaching a patch to do just that. We can then look at making things cleaner with
respect to flushing non-durable CFS in 1.0/trunk if we so wish.

Note that while having a non-durable system keyspace was not directly the problem, I think
it was a fairly bad idea, and we should leave it to durable for 0.8 and turn it back to durable
for 1.0 and trunk.

                
> Unit test are hanging on 0.8 branch
> -----------------------------------
>
>                 Key: CASSANDRA-3520
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3520
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tests
>         Environment: Linux
>            Reporter: Sylvain Lebresne
>             Fix For: 0.8.8
>
>         Attachments: 0001-Use-durable-writes-for-system-ks.patch, 3520.patch
>
>
> As the summary says, the unit test on current 0.8 are just hanging after CliTest (it's
apparently not the case on windows, but it is on Linux and MacOSX).
> Not sure what's going on, but what I can tell is that it's enough to run CliTest to have
it hang after the test successfully pass (i.e, JUnit just wait indefinitely for the VM to
exit). Even weirder, it seems that it is the counter increment in the CliTest that make it
hang, if you comment those statement, it stop hanging. However, nothing seems to go wrong
with the increment itself (the test passes) and it doesn't even trigger anything (typically
sendToHintedEndpoint is not called because there is only one node).
> Looking at the stack when the VM is hanging (attached), there is nothing specific to
counters in there, and nothing that struck me at odd (but I could miss something). There do
is a few thrift thread running (CASSANDRA-3335), but why would that only be a problem for
the tests in that situation is a mystery to me.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message