cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuud (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13652) Deadlock in AbstractCommitLogSegmentManager
Date Tue, 11 Jul 2017 08:37:00 GMT


Fuud commented on CASSANDRA-13652:

Just to keep all things together, copy from mailing list



I found possible deadlock in AbstractCommitLogSegmentManager. The root cause is incorrect
use of LockSupport.park/unpark pair. Unpark should be invoked only if caller is sure that
thread was parked in appropriate place. Otherwice permission given by calling unpark can be
consumed by other structures (for example - inside ReadWriteLock).


I suggest simplest solution: change LockSupport to Semaphore.

Also I suggest another solution with SynchronousQueue-like structure to move available segment
from Manager Thread to consumers. With theese changes code became more clear and 


We can not use j.u.c.SynchronousQueue because we need to support shutdown and there is only
way to terminate SynchronousQueue.put is to call Thread.interrupt(). But C* uses nio and it
does not expect ClosedByInterruptException during IO operations. Thus we can not interrupt
Manager Thread. 
I implemented o.a.c.u.c.Transferer that supports shutdown and restart (needed for tests).

Also I modified o.a.c.d.c.SimpleCachedBufferPool to support waiting for free space.

Please feel free to ask any questions.

Thank you.

Feodor Bobin

> Deadlock in AbstractCommitLogSegmentManager
> -------------------------------------------
>                 Key: CASSANDRA-13652
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Fuud
> AbstractCommitLogManager uses LockSupport.(un)park incorreclty. It invokes unpark without
checking if manager thread was parked in approriate place. 
> For example, logging frameworks uses queues and queues uses ReadWriteLock's that uses
LockSupport. Therefore AbstractCommitLogManager.wakeManager can wake thread inside Lock and
manager thread will sleep forever at park() method (because unpark permit was already consumed
inside lock).
> For examle stack traces:
> {code}
> "MigrationStage:1" id=412 state=WAITING
>     at sun.misc.Unsafe.park(Native Method)
>     at java.util.concurrent.locks.LockSupport.park(
>     at org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(
>     at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.awaitAvailableSegment(
>     at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.advanceAllocatingFrom(
>     at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(
>     at org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(
>     at org.apache.cassandra.config.Schema.dropView(
>     at org.apache.cassandra.schema.SchemaKeyspace.lambda$updateKeyspace$23(
>     at org.apache.cassandra.schema.SchemaKeyspace$$Lambda$382/1123232162.accept(Unknown
>     at java.util.LinkedHashMap$LinkedValues.forEach(
>     at java.util.Collections$UnmodifiableCollection.forEach(
>     at org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(
>     at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(
>     at org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(
>       - locked java.lang.Class@cc38904
>     at org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(
>     at
>     at java.util.concurrent.Executors$
>     at
>     at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$
>     at
>     at
>     at com.ringcentral.concurrent.executors.MonitoredThreadPoolExecutor$
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(
>     at java.util.concurrent.ThreadPoolExecutor$
>     at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(
>     at org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$61/
>     at
>     at sun.misc.Unsafe.park(Native Method)
>     at java.util.concurrent.locks.LockSupport.park(
>     at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager$1.runMayThrow(
>     at
>     at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(
>     at org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$61/
>     at
> {code}
> Solution is to use Semaphore instead of low-level LockSupport.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message