geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruce Schuchardt (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (GEODE-2024) Deadlock creating a new distributed lock service Grantor while transactions are in progress
Date Wed, 26 Oct 2016 23:08:58 GMT

     [ https://issues.apache.org/jira/browse/GEODE-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bruce Schuchardt resolved GEODE-2024.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.1.0-incubating

> Deadlock creating a new distributed lock service Grantor while transactions are in progress
> -------------------------------------------------------------------------------------------
>
>                 Key: GEODE-2024
>                 URL: https://issues.apache.org/jira/browse/GEODE-2024
>             Project: Geode
>          Issue Type: Bug
>          Components: distributed lock service
>            Reporter: Bruce Schuchardt
>            Assignee: Bruce Schuchardt
>             Fix For: 1.1.0-incubating
>
>
> An integration test running transactions in client caches while servers are killed and
restarted one by one encountered a hang when the server currrently serving as the distributed-lock
"grantor" was killed and another server attempted to take on the role.
> The stack traces below are from a build prior to package renaming.
> TX commit thread:
> {noformat}
> vm_0_bridge1_w1-gst-dev23_17330:ServerConnection on port 21566 Thread 352 ID=861 state=TIMED_WAITING
> 	waiting to lock <java.util.concurrent.CountDownLatch$Sync@41c71f7d>
> 	at sun.misc.Unsafe.park(Native Method)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
> 	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
> 	at com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:55)
> 	at com.gemstone.gemfire.internal.util.concurrent.FutureResult.get(FutureResult.java:54)
> 	at com.gemstone.gemfire.distributed.internal.locks.DLockService.waitForLockGrantorFutureResult(DLockService.java:774)
> 	at com.gemstone.gemfire.distributed.internal.locks.DLockService.notLockGrantorId(DLockService.java:837)
> 	at com.gemstone.gemfire.distributed.internal.locks.DLockService.releaseTryLocks(DLockService.java:2216)
> 	at com.gemstone.gemfire.internal.cache.locks.TXLockServiceImpl.release(TXLockServiceImpl.java:222)
> 	locked <java.util.ArrayList@34bbf1cf>
> 	at com.gemstone.gemfire.internal.cache.TXLockRequest.releaseDistributed(TXLockRequest.java:91)
> 	at com.gemstone.gemfire.internal.cache.TXLockRequest.cleanup(TXLockRequest.java:120)
> 	at com.gemstone.gemfire.internal.cache.TXState.cleanup(TXState.java:730)
> 	at com.gemstone.gemfire.internal.cache.TXState.commit(TXState.java:447)
> 	at com.gemstone.gemfire.internal.cache.TXStateProxyImpl.commit(TXStateProxyImpl.java:234)
> 	at com.gemstone.gemfire.internal.cache.TXManagerImpl.commit(TXManagerImpl.java:325)
> 	at com.gemstone.gemfire.internal.cache.tier.sockets.command.CommitCommand.cmdExecute(CommitCommand.java:80)
> 	at com.gemstone.gemfire.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:177)
> 	at com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:799)
> 	at com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.doOneMessage(ServerConnection.java:930)
> 	at com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at com.gemstone.gemfire.internal.cache.tier.sockets.AcceptorImpl$1$1.run(AcceptorImpl.java:555)
> 	at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@549109017
> java.util.concurrent.locks.ReentrantLock$NonfairSync@317942158
> {noformat}
> Grantor trying to recover:
> {noformat}
> "ServerConnection on port 21566 Thread 356" daemon prio=10 tid=0x00007f9cf8020800 nid=0x5c9e
waiting on condition [0x00007f9c698d6000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000000f4f2e4d0> (a java.util.concurrent.CountDownLatch$Sync)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
> 	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
> 	at com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:55)
> 	at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:743)
> 	at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:819)
> 	at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:796)
> 	at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:886)
> 	at com.gemstone.gemfire.distributed.internal.locks.DLockRecoverGrantorProcessor.recoverLockGrantor(DLockRecoverGrantorProcessor.java:96)
> 	at com.gemstone.gemfire.distributed.internal.locks.DLockService.makeLocalGrantor(DLockService.java:457)
> 	at com.gemstone.gemfire.distributed.internal.locks.DLockService.createLocalGrantor(DLockService.java:408)
> 	at com.gemstone.gemfire.distributed.internal.locks.DLockService.getLockGrantorId(DLockService.java:347)
> 	at com.gemstone.gemfire.distributed.internal.locks.DLockService.acquireTryLocks(DLockService.java:2278)
> 	at com.gemstone.gemfire.internal.cache.locks.TXLockServiceImpl.txLock(TXLockServiceImpl.java:132)
> 	at com.gemstone.gemfire.internal.cache.TXLockRequest.obtain(TXLockRequest.java:72)
> 	at com.gemstone.gemfire.internal.cache.TXState.reserveAndCheck(TXState.java:287)
> 	at com.gemstone.gemfire.internal.cache.TXState.commit(TXState.java:346)
> 	at com.gemstone.gemfire.internal.cache.TXStateProxyImpl.commit(TXStateProxyImpl.java:234)
> 	at com.gemstone.gemfire.internal.cache.TXManagerImpl.commit(TXManagerImpl.java:325)
> 	at com.gemstone.gemfire.internal.cache.tier.sockets.command.CommitCommand.cmdExecute(CommitCommand.java:80)
> 	at com.gemstone.gemfire.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:177)
> 	at com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:799)
> 	at com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.doOneMessage(ServerConnection.java:930)
> 	at com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at com.gemstone.gemfire.internal.cache.tier.sockets.AcceptorImpl$1$1.run(AcceptorImpl.java:555)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> And the message-processing thread that is gathering lock service state for the new grantor
but is blocked:
> {noformat}
> "Pooled Waiting Message Processor 16" daemon prio=10 tid=0x00007f9c9c01d800 nid=0x618c
waiting on condition [0x00007f9c6a1e0000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000000f0d906c0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:929)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1245)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
> 	at com.gemstone.gemfire.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lockInterruptibly(StoppableReentrantReadWriteLock.java:198)
> 	at com.gemstone.gemfire.internal.cache.locks.TXLockServiceImpl.acquireRecoveryWriteLock(TXLockServiceImpl.java:240)
> 	at com.gemstone.gemfire.internal.cache.locks.TXRecoverGrantorMessageProcessor.processDLockRecoverGrantorMessage(TXRecoverGrantorMessageProcessor.java:79)
> 	at com.gemstone.gemfire.internal.cache.locks.TXRecoverGrantorMessageProcessor$1.run(TXRecoverGrantorMessageProcessor.java:44)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at com.gemstone.gemfire.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:692)
> 	at com.gemstone.gemfire.distributed.internal.DistributionManager$6$1.run(DistributionManager.java:1029)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> There were no other threads in the distributed system blocked in TXLockServiceImpl.acquireRecoveryWriteLock
so in this case it was a simple deadlock involving three threads in a single server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message