cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12198) Deadlock in CDC during segment flush
Date Wed, 13 Jul 2016 22:42:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375934#comment-15375934
] 

Joshua McKenzie commented on CASSANDRA-12198:
---------------------------------------------

Changed synchronization to {{CommitLogSegment.cdcState}} in CDCSizeTracker and {{CommitLogSegment.setCDCState}}.
This should give us the previously desired effect of atomic changes to this state without
exposing us to the risk of deadlock by other unrelated methods synchronizing on the segment.

The other 2 uses of the cdcState should be unaffected by this (write path allocation check,
discard handling in segment manager) due to rules of transition (only set FORBIDDEN on segment
creation, only transition from PERMITTED to CONTAINS) and discard check should be guarded
by OpOrder barrier and flushing mechanisms.

Given I only saw this once in the wild while working on 12148 and the very infrequent nature
of it due to segment sync interaction, I'd prefer we get a review and get this into 3.8 rather
than blocking release to try and get a reproduction test.

Ran some targeted unit tests locally w/test-cdc and things look fine (CommitLogSegmentManagerCDCTest,
CommitLogTest, CommitLogStressTests). CI is running now.

||branch||testall||dtest||
|[12198|https://github.com/apache/cassandra/compare/cassandra-3.8...josh-mckenzie:12198?expand=1]|[testall|http://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-12198-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-12198-dtest]|

> Deadlock in CDC during segment flush
> ------------------------------------
>
>                 Key: CASSANDRA-12198
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12198
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joshua McKenzie
>            Assignee: Joshua McKenzie
>            Priority: Blocker
>             Fix For: 3.8
>
>
> In the patch for CASSANDRA-8844, we added a {{synchronized(this)}} block inside CommitLogSegment.setCDCState.
This introduces the possibility of deadlock in the following scenario:
> # A {{CommitLogSegment.sync()}} call is made (synchronized method)
> # A {{CommitLogSegment.allocate}} call from a cdc-enabled write is in flight and acquires
a reference to the Group on appendOrder (the OpOrder in the Segment)
> # {{CommmitLogSegment.sync}} hits {{waitForModifications}} which calls {{appendOrder.awaitNewBarrier}}
> # The in-flight write, if changing the state of the segment from CDCState.PERMITTED to
CDCState.CONTAINS, enters {{setCDCState}} and blocks on synchronized(this)
> And neither of them ever come back. This came up while doing some further work on CASSANDRA-12148.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message