cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Eriksson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-13422) CompactionStrategyManager should take write not read lock when handling add/remove notifications
Date Fri, 07 Apr 2017 12:58:41 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marcus Eriksson updated CASSANDRA-13422:
----------------------------------------
    Reviewer: Marcus Eriksson  (was: Blake Eggleston)

> CompactionStrategyManager should take write not read lock when handling add/remove notifications
> ------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13422
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13422
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 4.0, 3.11.x
>
>
> {{getNextBackgroundTask}} in various compaction strategies (definitely {{LCS}}) rely
on checking the result of {{DataTracker.getCompacting()}} to avoid accessing data and metadata
related to tables that have already head their resources released.
> There is a race where this check is unreliable and will claim a table that has its resources
already released is not compacting resulting in use after free.
> [{{LeveledCompactionStrategy.findDroppableSSTable}}|https://github.com/apache/cassandra/blob/c794d2bed7ca1d10e13c4da08a3d45f5c755c1d8/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java#L504]
for instance has this three part logical && condition where the first check is against
the compacting set before calling {{worthDroppingTombstones}} which fails if the table has
been released.
> The order of events is basically that CompactionStrategyManager acquires the read lock
in getNextBackgroundTask(), then proceeds eventually to findDroppableSSTable and acquires
a set of SSTables from the manifest. While the manifest is thread safe it's not accessed atomically
WRT to other operations. Once it has acquire the set of tables it acquires the (not atomically)
the set of compacting SSTables and iterates checking the former against the latter.
> Meanwhile other compaction threads are marking tables obsolete or compacted and releasing
their references. Doing this removes them from {{DataTracker}} and publishes a notification
to the strategies, but this notification only requires the read lock. After the compaction
thread has published the notifications it eventually marks the table as not compacting in
{{DataTracker}} or removes it entirely.
> The race is then that the compaction thread generating a new background task acquires
the sstables from the manifest on the stack. Any table in that set that was compacting at
that time must remain compacting so that it can be skipped. Another compaction thread finishes
a compaction and is able to remove the table from the manifest and then remove it from the
compacting set. The thread generating the background task then acquires the list of compacting
tables which doesn't include the table it is supposed to skip.
> The simple fix appears to be to require threads to acquire the write lock in order to
publish notifications of tables being added/removed from compaction strategies. While holding
the write lock it won't be possible for someone to see a view of tables in the manifest where
tables that are compacting aren't compacting in the view



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message