cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10181) Deadlock flushing tables with CUSTOM indexes
Date Wed, 26 Aug 2015 07:55:45 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712014#comment-14712014
] 

Sam Tunnicliffe edited comment on CASSANDRA-10181 at 8/26/15 7:54 AM:
----------------------------------------------------------------------

I think this has always been an issue, it probably hasn't been a problem until now as we enforced
a check in {{SIM#addIndexedColumn}} that a custom index didn't extend {{AbstractSimplePerColumnSecondaryIndex}}.
Aside from that though, I think if one had been sufficiently motivated to write one, I think
a CFS backed custom index would have deadlocked earlier versions too. 

In the linked branch, I've modified the post flush task to force flush non-CFS backed indexes,
rather than all custom indexes. I was expected to have to modify {{CFS#concatWithIndexes}}
to so that it would include CFS backed custom indexes in the actual flush task but it was
already doing so, which is another latent bug (although now it's actually the right thing
to do). The branch is built on your patch from CASSANDRA-10180.

Patches:
* [3.0 branch|https://github.com/beobal/cassandra/tree/10181-3.0]
* [trunk branch|https://github.com/beobal/cassandra/tree/10181-trunk]

CI Tests:
* [3.0 testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-3.0-testall/]
* [3.0 dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-3.0-dtest/]
* [trunk testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-trunk-testall/]
* [trunk dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-dtest/]



was (Author: beobal):
I think this has always been an issue, it probably hasn't been a problem until now as we enforced
a check in {{SIM#addIndexedColumn}} that a custom index didn't extend {{AbstractSimplePerColumnSecondaryIndex}}.
Aside from that though, I think if one had been sufficiently motivated to write one, I think
a CFS backed custom index would have deadlocked earlier versions too. 

In the linked branch, I've modified the post flush task to force flush non-CFS backed indexes,
rather than all custom indexes. I was expected to have to modify {{CFS#concatWithIndexes}}
to so that it would include CFS backed custom indexes in the actual flush task but it was
already doing so, which is another latent bug (although now it's actually the right thing
to do). The branch is built on your patch from CASSANDRA-10180.

Patches:
* [3.0 branch|https://github.com/beobal/cassandra/tree/10181-3.0]
* [trunk branch|https://github.com/beobal/cassandra/tree/10181-trunk]

CI Tests:
* [3.0 testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-testall/]
* [3.0 dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-dtest/]
* [trunk testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-trunk-testall/]
* [trunk dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-dtest/]


> Deadlock flushing tables with CUSTOM indexes
> --------------------------------------------
>
>                 Key: CASSANDRA-10181
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10181
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Tyler Hobbs
>            Assignee: Sam Tunnicliffe
>             Fix For: 3.0 beta 2
>
>         Attachments: flush-deadlock-repro.txt
>
>
> In 3.0, if a table with a CUSTOM secondary index is force flushed, Cassandra will deadlock
while attempting to perform a blocking flush on the tables backing the secondary indexes.
> The basic problem is that the base table's post-flush task ends up waiting on the post-flush
task for the secondary index to complete.  However, since the post-flush executor is single-threaded,
this results in a deadlock.
> Here's the partial stacktrace for the base table part of this (line numbers may not be
100% accurate):
> {noformat}
> org.apache.cassandra.db.ColumnFamilyStore.forceBlockingFlush(ColumnFamilyStore.java:927)
> 	at org.apache.cassandra.index.internal.CustomIndex.lambda$getBlockingFlushTask$0(VertexCentricIndex.java:114)
> 	at org.apache.cassandra.index.internal.CustomIndex$$Lambda$95/057902870.call(Unknown
Source)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)
> 	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
> 	at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:58)
> 	at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:37)
> 	at org.apache.cassandra.index.SecondaryIndexManager.lambda$executeAllBlocking$39(SecondaryIndexManager.java:896)
> 	at org.apache.cassandra.index.SecondaryIndexManager$$Lambda$94/25774682.accept(Unknown
Source)
> 	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
> 	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
> 	at org.apache.cassandra.index.SecondaryIndexManager.executeAllBlocking(SecondaryIndexManager.java:893)
> 	at org.apache.cassandra.index.SecondaryIndexManager.flushIndexesBlocking(SecondaryIndexManager.java:346)
> 	at org.apache.cassandra.index.SecondaryIndexManager.flushAllCustomIndexesBlocking(SecondaryIndexManager.java:358)
> 	at org.apache.cassandra.db.ColumnFamilyStore$PostFlush.run(ColumnFamilyStore.java:960)
> {noformat}
> First, note that the base of this stacktrace is in CFS$PostFlush.run(), which means it's
running on the post-flush executor.  When {{CFS.forceBlockingFlush()}} is called on the secondary
index table, we end up blocking on another task that's submitted to the post-flush executor.
 Since that executor is single-threaded and is already running the base table task, this results
in deadlock.
> The attached patch includes a unit test and custom secondary index class (basically just
KeysIndex) to reproduce the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message