geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Barry Oglesby (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (GEODE-1677) Persistent AsyncEventQueue with non-persistent data PR hangs during recovery
Date Tue, 19 Jul 2016 17:19:20 GMT

     [ https://issues.apache.org/jira/browse/GEODE-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Barry Oglesby reassigned GEODE-1677:
------------------------------------

    Assignee: Barry Oglesby

> Persistent AsyncEventQueue with non-persistent data PR hangs during recovery
> ----------------------------------------------------------------------------
>
>                 Key: GEODE-1677
>                 URL: https://issues.apache.org/jira/browse/GEODE-1677
>             Project: Geode
>          Issue Type: Bug
>          Components: wan
>            Reporter: Barry Oglesby
>            Assignee: Barry Oglesby
>
> This is the same bug as GEM-801.
> During recovery of a persistent {{AsyncEventQueue}} on a non-persistent data {{PartitionedRegion}},
a deadlock occurs.
> Here is analysis duplicated from GEM-801:
> *Member dataStoregemfire1_31558*
> This member has created its PR and is recovering its shadow PR (async event queue). The
{{ParallelGatewaySenderQueue addShadowPartitionedRegionForUserPR}} method has taken the {{AbstractGatewaySender's
lifeCycleLock's writeLock}}.
> The bgexec19832_31558.log thread dumps show:
> {noformat}
> "vm_0_thr_0_dataStore1_client-13_31558" #162 daemon prio=5 os_prio=0 tid=0x00007f406c01f800
nid=0x7fca waiting on condition [0x00007f40bd7c4000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000000f1a6db90> (a java.util.concurrent.CountDownLatch$Sync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> 	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> 	at com.gemstone.gemfire.internal.cache.BucketPersistenceAdvisor.waitForPrimaryPersistentRecovery(BucketPersistenceAdvisor.java:362)
> 	at com.gemstone.gemfire.internal.cache.ProxyBucketRegion.waitForPrimaryPersistentRecovery(ProxyBucketRegion.java:632)
> 	at com.gemstone.gemfire.internal.cache.PRHARedundancyProvider.recoverPersistentBuckets(PRHARedundancyProvider.java:1782)
> 	at com.gemstone.gemfire.internal.cache.PartitionedRegion.initPRInternals(PartitionedRegion.java:887)
> 	- locked <0x00000000f1cd7070> (a com.gemstone.gemfire.internal.cache.wan.parallel.ParallelGatewaySenderQueue$ParallelGatewaySenderQueueMetaRegion)
> 	at com.gemstone.gemfire.internal.cache.PartitionedRegion.initialize(PartitionedRegion.java:1007)
> 	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3065)
> 	at com.gemstone.gemfire.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:559)
> 	at com.gemstone.gemfire.internal.cache.wan.parallel.ParallelGatewaySenderEventProcessor.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderEventProcessor.java:203)
> 	at com.gemstone.gemfire.internal.cache.wan.parallel.ConcurrentParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ConcurrentParallelGatewaySenderQueue.java:172)
> 	at com.gemstone.gemfire.internal.cache.PartitionedRegion.postCreateRegion(PartitionedRegion.java:986)
> 	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3109)
> 	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2959)
> 	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2948)
> 	at hydra.RegionHelper.createRegion(RegionHelper.java:117)
> 	- locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelper)
> 	at hydra.RegionHelper.createRegion(RegionHelper.java:85)
> 	- locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelper)
> 	at hydra.RegionHelper.createRegion(RegionHelper.java:72)
> 	- locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelper)
> 	at hydra.RegionHelper.createRegion(RegionHelper.java:52)
> 	- locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelper)
> 	at parReg.wbcl.ParRegWBCLTest.HA_reinitializeRegion(ParRegWBCLTest.java:250)
> 	at parReg.ParRegTest.HAController(ParRegTest.java:2063)
> 	at parReg.wbcl.ParRegWBCLTest.HAController(ParRegWBCLTest.java:274)
> 	at parReg.ParRegTest.HydraTask_HAController(ParRegTest.java:985)
> {noformat}
> As part of recovery, 5 buckets are waiting for their initial images:
> {noformat}
> "Recovery thread for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_102"
#705 daemon prio=5 os_prio=0 tid=0x00007f406c16d000 nid=0x954 waiting on condition [0x00007f3fcdfdd000]
> "Recovery thread for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_99"
#702 daemon prio=5 os_prio=0 tid=0x00007f406c169800 nid=0x951 waiting on condition [0x00007f3fce2e0000]
> "Recovery thread for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_93"
#696 daemon prio=5 os_prio=0 tid=0x00007f406c161800 nid=0x94b waiting on condition [0x00007f3fce8e6000]
> "Recovery thread for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_64"
#665 daemon prio=5 os_prio=0 tid=0x00007f406c13b000 nid=0x92e waiting on condition [0x00007f3fd55d5000]
> "Recovery thread for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_54"
#655 daemon prio=5 os_prio=0 tid=0x00007f406c12e800 nid=0x924 waiting on condition [0x00007f3fd5fdf000]
> {noformat}
> Here is bucket 93's recovery thread:
> {noformat}
> "Recovery thread for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_93"
#696 daemon prio=5 os_prio=0 tid=0x00007f406c161800 nid=0x94b waiting on condition [0x00007f3fce8e6000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000000f17baf68> (a java.util.concurrent.CountDownLatch$Sync)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> 	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> 	at com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
> 	at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:743)
> 	at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:819)
> 	at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:796)
> 	at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:886)
> 	at com.gemstone.gemfire.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:458)
> 	at com.gemstone.gemfire.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1352)
> 	at com.gemstone.gemfire.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1159)
> 	at com.gemstone.gemfire.internal.cache.BucketRegion.initialize(BucketRegion.java:263)
> 	at com.gemstone.gemfire.internal.cache.LocalRegion.createSubregion(LocalRegion.java:892)
> 	at com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:765)
> 	at com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:444)
> 	- locked <0x00000000f17a9150> (a com.gemstone.gemfire.internal.cache.ProxyBucketRegion)
> 	at com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2982)
> 	at com.gemstone.gemfire.internal.cache.ProxyBucketRegion.recoverFromDisk(ProxyBucketRegion.java:446)
> 	at com.gemstone.gemfire.internal.cache.ProxyBucketRegion.recoverFromDiskRecursively(ProxyBucketRegion.java:403)
> 	at com.gemstone.gemfire.internal.cache.PRHARedundancyProvider$4.run2(PRHARedundancyProvider.java:1765)
> 	at com.gemstone.gemfire.internal.cache.partitioned.RecoveryRunnable.run(RecoveryRunnable.java:64)
> 	at com.gemstone.gemfire.internal.cache.PRHARedundancyProvider$4.run(PRHARedundancyProvider.java:1757)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The dataStoregemfire1_31558/system.log contains this warning showing the above thread
is waiting for member dataStoregemfire1_client-13_31576:
> {noformat}
> [warning 2016/07/03 04:10:04.000 UTC dataStoregemfire1_client-13_31558 <Recovery thread
for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_93> tid=0x2b8]
15 seconds have elapsed while waiting for replies: <com.gemstone.gemfire.internal.cache.InitialImageOperation$ImageProcessor
4138 waiting for 1 replies from [client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027];
waiting for 0 messages in-flight; region=/__PR/_B__dataStoreRegion_93; abort=false> on
client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025 whose current
membership list is: [[client-13(31491:locator)<ec><v0>:1024, client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025,
client-13(dataStoregemfire2_client-13_482:482)<ec><v3>:1026, client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027,
client-13(dataStoregemfire1_client-13_31563:31563)<ec><v1>:1028, client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029,
client-13(dataStoregemfire2_client-13_460:460)<ec><v4>:1030]]
> {noformat}
> *Member dataStoregemfire1_client-13_31576*
> The bgexec16591_31576.log thread dumps show several blocked Pooled High Priority Message
Processor threads waiting for entries while processing {{InitialImageOperations}}:
> {noformat}
> "Pooled High Priority Message Processor 11" #372 daemon prio=10 os_prio=0 tid=0x00007f609c047000
nid=0x581 waiting for monitor entry [0x00007f6090e4f000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
> 	at com.gemstone.gemfire.internal.cache.InitialImageOperation$RequestImageMessage.chunkEntries(InitialImageOperation.java:1857)
> 	- waiting to lock <0x00000000f16ce110> (a com.gemstone.gemfire.internal.cache.VersionedThinRegionEntryHeapStringKey2)
> 	at com.gemstone.gemfire.internal.cache.InitialImageOperation$RequestImageMessage.process(InitialImageOperation.java:1657)
> 	at com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:379)
> 	at com.gemstone.gemfire.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:450)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at com.gemstone.gemfire.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:611)
> 	at com.gemstone.gemfire.distributed.internal.DistributionManager$5$1.run(DistributionManager.java:922)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The P2P message reader that has the entry lock is waiting for replies from dataStoregemfire1_client-13_31558:31558
shown by the log warning and thread below:
> {noformat}
> [warning 2016/07/03 04:10:03.671 UTC dataStoregemfire1_client-13_31576 <P2P message
reader for client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029 unshared
ordered uid=416 dom #1 port=35558> tid=0x150] 15 seconds have elapsed while waiting for
replies: <DistributedCacheOperation$CacheOperationReplyProcessor 4707 waiting for 2 replies
from [client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025, client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025]>
on client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027 whose current
membership list is: [[client-13(31491:locator)<ec><v0>:1024, client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025,
client-13(dataStoregemfire2_client-13_482:482)<ec><v3>:1026, client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027,
client-13(dataStoregemfire1_client-13_31563:31563)<ec><v1>:1028, client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029,
client-13(dataStoregemfire2_client-13_460:460)<ec><v4>:1030]]
> {noformat}
> {noformat}
> "P2P message reader for client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029
unshared ordered uid=416 dom #1 port=35558" #336 daemon prio=10 os_prio=0 tid=0x00007f6169731800
nid=0x4d5 waiting on condition [0x00007f6093373000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000000f16cdfc8> (a java.util.concurrent.CountDownLatch$Sync)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> 	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> 	at com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
> 	at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:743)
> 	at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:819)
> 	at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:796)
> 	at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:886)
> 	at com.gemstone.gemfire.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:743)
> 	at com.gemstone.gemfire.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:622)
> 	at com.gemstone.gemfire.internal.cache.AbstractUpdateOperation.distribute(AbstractUpdateOperation.java:71)
> 	at com.gemstone.gemfire.internal.cache.BucketRegion.basicPutPart2(BucketRegion.java:634)
> 	at com.gemstone.gemfire.internal.cache.AbstractRegionMap.basicPut(AbstractRegionMap.java:2736)
> 	- locked <0x00000000f16ce110> (a com.gemstone.gemfire.internal.cache.VersionedThinRegionEntryHeapStringKey2)
> 	at com.gemstone.gemfire.internal.cache.BucketRegion.virtualPut(BucketRegion.java:485)
> 	at com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.putLocally(PartitionedRegionDataStore.java:1275)
> 	at com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.putLocally(PartitionedRegionDataStore.java:1250)
> 	at com.gemstone.gemfire.internal.cache.PartitionedRegionDataView.putEntryOnRemote(PartitionedRegionDataView.java:107)
> 	at com.gemstone.gemfire.internal.cache.partitioned.PutMessage.operateOnPartitionedRegion(PutMessage.java:833)
> 	at com.gemstone.gemfire.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:339)
> 	at com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:379)
> 	at com.gemstone.gemfire.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:442)
> 	at com.gemstone.gemfire.distributed.internal.DistributionManager.scheduleIncomingMessage(DistributionManager.java:3519)
> 	at com.gemstone.gemfire.distributed.internal.DistributionManager.handleIncomingDMsg(DistributionManager.java:3142)
> 	at com.gemstone.gemfire.distributed.internal.DistributionManager$MyListener.messageReceived(DistributionManager.java:4341)
> 	at com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1100)
> 	at com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1028)
> 	at com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:382)
> 	at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:726)
> 	at com.gemstone.gemfire.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:815)
> 	at com.gemstone.gemfire.internal.tcp.Connection.dispatchMessage(Connection.java:3961)
> 	at com.gemstone.gemfire.internal.tcp.Connection.processNIOBuffer(Connection.java:3545)
> 	at com.gemstone.gemfire.internal.tcp.Connection.runNioReader(Connection.java:1837)
> 	at com.gemstone.gemfire.internal.tcp.Connection.run(Connection.java:1706)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Back in bgexec19832_31558.log, the thread dumps show a number of P2P message reader threads
for dataStoregemfire1_client-13_31576:31576 stuck waiting for the {{AbstractGatewaySender's
lifeCycleLock's readLock}} here:
> {noformat}
> "P2P message reader for client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027
unshared ordered uid=537 dom #2 port=55007" #868 daemon prio=10 os_prio=0 tid=0x00007f40403a4000
nid=0xa23 waiting on condition [0x00007f3fc3442000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000000f1cbe5f8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> 	at com.gemstone.gemfire.internal.cache.wan.AbstractGatewaySender.distribute(AbstractGatewaySender.java:928)
> 	at com.gemstone.gemfire.internal.cache.LocalRegion.notifyGatewaySender(LocalRegion.java:6485)
> 	at com.gemstone.gemfire.internal.cache.BucketRegion.notifyGatewaySender(BucketRegion.java:654)
> 	at com.gemstone.gemfire.internal.cache.LocalRegion.basicPutPart2(LocalRegion.java:6022)
> 	at com.gemstone.gemfire.internal.cache.BucketRegion.basicPutPart2(BucketRegion.java:644)
> 	at com.gemstone.gemfire.internal.cache.AbstractRegionMap.basicPut(AbstractRegionMap.java:2736)
> 	- locked <0x00000000f18891f0> (a com.gemstone.gemfire.internal.cache.VersionedThinRegionEntryHeapStringKey2)
> 	at com.gemstone.gemfire.internal.cache.BucketRegion.virtualPut(BucketRegion.java:485)
> 	at com.gemstone.gemfire.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:132)
> 	at com.gemstone.gemfire.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5817)
> 	at com.gemstone.gemfire.internal.cache.AbstractUpdateOperation.doPutOrCreate(AbstractUpdateOperation.java:148)
> 	at com.gemstone.gemfire.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:286)
> 	at com.gemstone.gemfire.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:255)
> 	at com.gemstone.gemfire.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1191)
> 	at com.gemstone.gemfire.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1092)
> 	at com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:379)
> 	at com.gemstone.gemfire.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:442)
> 	at com.gemstone.gemfire.distributed.internal.DistributionManager.scheduleIncomingMessage(DistributionManager.java:3519)
> 	at com.gemstone.gemfire.distributed.internal.DistributionManager.handleIncomingDMsg(DistributionManager.java:3142)
> 	at com.gemstone.gemfire.distributed.internal.DistributionManager$MyListener.messageReceived(DistributionManager.java:4341)
> 	at com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1100)
> 	at com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1028)
> 	at com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:382)
> 	at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:726)
> 	at com.gemstone.gemfire.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:815)
> 	at com.gemstone.gemfire.internal.tcp.Connection.dispatchMessage(Connection.java:3961)
> 	at com.gemstone.gemfire.internal.tcp.Connection.processNIOBuffer(Connection.java:3545)
> 	at com.gemstone.gemfire.internal.tcp.Connection.runNioReader(Connection.java:1837)
> 	at com.gemstone.gemfire.internal.tcp.Connection.run(Connection.java:1706)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> These threads will never get the {{readLock}} since the {{writeLock}} is blocked.
> This deadlock only occurs when the {{AsyncEventQueue}} is persistent, but its attached
data region is not.
> The regions being recovered by the {{AsyncEventQueue}} recovery threads are the actual
data regions. Its the dataStoreRegion that is being GIIed not the {{AsyncEventQueue}} region:
> {noformat}
> [info 2016/07/03 04:09:48.504 UTC dataStoregemfire1_client-13_31558 <Recovery thread
for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_102> tid=0x2c1]
Region _B__dataStoreRegion_102 requesting initial image from client-13(dataStoregemfire1_client-13_31563:31563)<ec><v1>:1028
> [info 2016/07/03 04:09:48.968 UTC dataStoregemfire1_client-13_31558 <Recovery thread
for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_54> tid=0x28f]
Region _B__dataStoreRegion_54 requesting initial image from client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029
> [info 2016/07/03 04:09:49.007 UTC dataStoregemfire1_client-13_31558 <Recovery thread
for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_99> tid=0x2be]
Region _B__dataStoreRegion_99 requesting initial image from client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027
> [info 2016/07/03 04:09:49.202 UTC dataStoregemfire1_client-13_31558 <Recovery thread
for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_93> tid=0x2b8]
Region _B__dataStoreRegion_93 requesting initial image from client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027
> [info 2016/07/03 04:09:49.206 UTC dataStoregemfire1_client-13_31558 <Recovery thread
for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_64> tid=0x299]
Region _B__dataStoreRegion_64 requesting initial image from client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027
> {noformat}
> The code below is from the {{ProxyBucketRegion recoverFromDisk}} method which is executed
during recovery of the {{AsyncEventQueue}} bucket. This is the source of the data region GII:
> {noformat}
> if(this.partitionedRegion.getDataPolicy().withPersistence() && !colocatedRegion.getDataPolicy().withPersistence())
{
> 	result = colocatedRegion.getDataStore()
> 	.grabBucket(bid, getDistributionManager().getDistributionManagerId(), 
> 			true, true, false, null, true);
>   ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message