Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 42E17200B67 for ; Tue, 2 Aug 2016 00:21:29 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 418E5160AA7; Mon, 1 Aug 2016 22:21:29 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E5144160A6C for ; Tue, 2 Aug 2016 00:21:27 +0200 (CEST) Received: (qmail 30614 invoked by uid 500); 1 Aug 2016 22:21:27 -0000 Mailing-List: contact issues-help@geode.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@geode.incubator.apache.org Delivered-To: mailing list issues@geode.incubator.apache.org Received: (qmail 30605 invoked by uid 99); 1 Aug 2016 22:21:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2016 22:21:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B5FB11A1359 for ; Mon, 1 Aug 2016 22:21:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.646 X-Spam-Level: X-Spam-Status: No, score=-4.646 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id dSQEfMhttGOv for ; Mon, 1 Aug 2016 22:21:22 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with SMTP id C4F9E5F613 for ; Mon, 1 Aug 2016 22:21:21 +0000 (UTC) Received: (qmail 29025 invoked by uid 99); 1 Aug 2016 22:21:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2016 22:21:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8EF4F2C0D60 for ; Mon, 1 Aug 2016 22:21:20 +0000 (UTC) Date: Mon, 1 Aug 2016 22:21:20 +0000 (UTC) From: "Barry Oglesby (JIRA)" To: issues@geode.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (GEODE-1677) Persistent AsyncEventQueue with non-persistent data PR hangs during recovery MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 01 Aug 2016 22:21:29 -0000 [ https://issues.apache.org/jira/browse/GEODE-1677?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-1677. ---------------------------------- Resolution: Fixed > Persistent AsyncEventQueue with non-persistent data PR hangs during recov= ery > -------------------------------------------------------------------------= --- > > Key: GEODE-1677 > URL: https://issues.apache.org/jira/browse/GEODE-1677 > Project: Geode > Issue Type: Bug > Components: wan > Reporter: Barry Oglesby > Assignee: Barry Oglesby > > This is the same bug as GEM-801. > During recovery of a persistent {{AsyncEventQueue}} on a non-persistent d= ata {{PartitionedRegion}}, a deadlock occurs. > Here is analysis duplicated from GEM-801: > *Member dataStoregemfire1_31558* > This member has created its PR and is recovering its shadow PR (async eve= nt queue). The {{ParallelGatewaySenderQueue addShadowPartitionedRegionForUs= erPR}} method has taken the {{AbstractGatewaySender's lifeCycleLock's write= Lock}}. > The bgexec19832_31558.log thread dumps show: > {noformat} > "vm_0_thr_0_dataStore1_client-13_31558" #162 daemon prio=3D5 os_prio=3D0 = tid=3D0x00007f406c01f800 nid=3D0x7fca waiting on condition [0x00007f40bd7c4= 000] > java.lang.Thread.State: WAITING (parking) > =09at sun.misc.Unsafe.park(Native Method) > =09- parking to wait for <0x00000000f1a6db90> (a java.util.concurrent.Co= untDownLatch$Sync) > =09at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckI= nterrupt(AbstractQueuedSynchronizer.java:836) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShar= edInterruptibly(AbstractQueuedSynchronizer.java:997) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared= Interruptibly(AbstractQueuedSynchronizer.java:1304) > =09at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > =09at com.gemstone.gemfire.internal.cache.BucketPersistenceAdvisor.waitFo= rPrimaryPersistentRecovery(BucketPersistenceAdvisor.java:362) > =09at com.gemstone.gemfire.internal.cache.ProxyBucketRegion.waitForPrimar= yPersistentRecovery(ProxyBucketRegion.java:632) > =09at com.gemstone.gemfire.internal.cache.PRHARedundancyProvider.recoverP= ersistentBuckets(PRHARedundancyProvider.java:1782) > =09at com.gemstone.gemfire.internal.cache.PartitionedRegion.initPRInterna= ls(PartitionedRegion.java:887) > =09- locked <0x00000000f1cd7070> (a com.gemstone.gemfire.internal.cache.w= an.parallel.ParallelGatewaySenderQueue$ParallelGatewaySenderQueueMetaRegion= ) > =09at com.gemstone.gemfire.internal.cache.PartitionedRegion.initialize(Pa= rtitionedRegion.java:1007) > =09at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion= (GemFireCacheImpl.java:3065) > =09at com.gemstone.gemfire.internal.cache.wan.parallel.ParallelGatewaySen= derQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.jav= a:559) > =09at com.gemstone.gemfire.internal.cache.wan.parallel.ParallelGatewaySen= derEventProcessor.addShadowPartitionedRegionForUserPR(ParallelGatewaySender= EventProcessor.java:203) > =09at com.gemstone.gemfire.internal.cache.wan.parallel.ConcurrentParallel= GatewaySenderQueue.addShadowPartitionedRegionForUserPR(ConcurrentParallelGa= tewaySenderQueue.java:172) > =09at com.gemstone.gemfire.internal.cache.PartitionedRegion.postCreateReg= ion(PartitionedRegion.java:986) > =09at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion= (GemFireCacheImpl.java:3109) > =09at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateReg= ion(GemFireCacheImpl.java:2959) > =09at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createRegion(G= emFireCacheImpl.java:2948) > =09at hydra.RegionHelper.createRegion(RegionHelper.java:117) > =09- locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelpe= r) > =09at hydra.RegionHelper.createRegion(RegionHelper.java:85) > =09- locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelpe= r) > =09at hydra.RegionHelper.createRegion(RegionHelper.java:72) > =09- locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelpe= r) > =09at hydra.RegionHelper.createRegion(RegionHelper.java:52) > =09- locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelpe= r) > =09at parReg.wbcl.ParRegWBCLTest.HA_reinitializeRegion(ParRegWBCLTest.jav= a:250) > =09at parReg.ParRegTest.HAController(ParRegTest.java:2063) > =09at parReg.wbcl.ParRegWBCLTest.HAController(ParRegWBCLTest.java:274) > =09at parReg.ParRegTest.HydraTask_HAController(ParRegTest.java:985) > {noformat} > As part of recovery, 5 buckets are waiting for their initial images: > {noformat} > "Recovery thread for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GAT= EWAY__SENDER__QUEUE_102" #705 daemon prio=3D5 os_prio=3D0 tid=3D0x00007f406= c16d000 nid=3D0x954 waiting on condition [0x00007f3fcdfdd000] > "Recovery thread for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GAT= EWAY__SENDER__QUEUE_99" #702 daemon prio=3D5 os_prio=3D0 tid=3D0x00007f406c= 169800 nid=3D0x951 waiting on condition [0x00007f3fce2e0000] > "Recovery thread for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GAT= EWAY__SENDER__QUEUE_93" #696 daemon prio=3D5 os_prio=3D0 tid=3D0x00007f406c= 161800 nid=3D0x94b waiting on condition [0x00007f3fce8e6000] > "Recovery thread for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GAT= EWAY__SENDER__QUEUE_64" #665 daemon prio=3D5 os_prio=3D0 tid=3D0x00007f406c= 13b000 nid=3D0x92e waiting on condition [0x00007f3fd55d5000] > "Recovery thread for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GAT= EWAY__SENDER__QUEUE_54" #655 daemon prio=3D5 os_prio=3D0 tid=3D0x00007f406c= 12e800 nid=3D0x924 waiting on condition [0x00007f3fd5fdf000] > {noformat} > Here is bucket 93's recovery thread: > {noformat} > "Recovery thread for bucket _B__AsyncEventQueue__wbclQueue__PARALLEL__GAT= EWAY__SENDER__QUEUE_93" #696 daemon prio=3D5 os_prio=3D0 tid=3D0x00007f406c= 161800 nid=3D0x94b waiting on condition [0x00007f3fce8e6000] > java.lang.Thread.State: TIMED_WAITING (parking) > =09at sun.misc.Unsafe.park(Native Method) > =09- parking to wait for <0x00000000f17baf68> (a java.util.concurrent.Co= untDownLatch$Sync) > =09at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:2= 15) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShar= edNanos(AbstractQueuedSynchronizer.java:1037) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSha= redNanos(AbstractQueuedSynchronizer.java:1328) > =09at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > =09at com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLat= ch.await(StoppableCountDownLatch.java:64) > =09at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.basicWai= t(ReplyProcessor21.java:743) > =09at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForR= epliesUninterruptibly(ReplyProcessor21.java:819) > =09at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForR= epliesUninterruptibly(ReplyProcessor21.java:796) > =09at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForR= epliesUninterruptibly(ReplyProcessor21.java:886) > =09at com.gemstone.gemfire.internal.cache.InitialImageOperation.getFromOn= e(InitialImageOperation.java:458) > =09at com.gemstone.gemfire.internal.cache.DistributedRegion.getInitialIma= geAndRecovery(DistributedRegion.java:1352) > =09at com.gemstone.gemfire.internal.cache.DistributedRegion.initialize(Di= stributedRegion.java:1159) > =09at com.gemstone.gemfire.internal.cache.BucketRegion.initialize(BucketR= egion.java:263) > =09at com.gemstone.gemfire.internal.cache.LocalRegion.createSubregion(Loc= alRegion.java:892) > =09at com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.crea= teBucketRegion(PartitionedRegionDataStore.java:765) > =09at com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.grab= FreeBucket(PartitionedRegionDataStore.java:444) > =09- locked <0x00000000f17a9150> (a com.gemstone.gemfire.internal.cache.P= roxyBucketRegion) > =09at com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.grab= Bucket(PartitionedRegionDataStore.java:2982) > =09at com.gemstone.gemfire.internal.cache.ProxyBucketRegion.recoverFromDi= sk(ProxyBucketRegion.java:446) > =09at com.gemstone.gemfire.internal.cache.ProxyBucketRegion.recoverFromDi= skRecursively(ProxyBucketRegion.java:403) > =09at com.gemstone.gemfire.internal.cache.PRHARedundancyProvider$4.run2(P= RHARedundancyProvider.java:1765) > =09at com.gemstone.gemfire.internal.cache.partitioned.RecoveryRunnable.ru= n(RecoveryRunnable.java:64) > =09at com.gemstone.gemfire.internal.cache.PRHARedundancyProvider$4.run(PR= HARedundancyProvider.java:1757) > =09at java.lang.Thread.run(Thread.java:745) > {noformat} > The dataStoregemfire1_31558/system.log contains this warning showing the = above thread is waiting for member dataStoregemfire1_client-13_31576: > {noformat} > [warning 2016/07/03 04:10:04.000 UTC dataStoregemfire1_client-13_31558 tid=3D0x2b8] 15 seconds have elapsed while waiting for = replies: :1027]; waiting for 0 messages in-flight; region=3D= /__PR/_B__dataStoreRegion_93; abort=3Dfalse> on client-13(dataStoregemfire1= _client-13_31558:31558):1025 whose current membership list is: [[cl= ient-13(31491:locator):1024, client-13(dataStoregemfire1_client-13_= 31558:31558):1025, client-13(dataStoregemfire2_client-13_482:482):1026, client-13(dataStoregemfire1_client-13_31576:31576):102= 7, client-13(dataStoregemfire1_client-13_31563:31563):1028, client-= 13(dataStoregemfire2_client-13_31595:31595):1029, client-13(dataSto= regemfire2_client-13_460:460):1030]] > {noformat} > *Member dataStoregemfire1_client-13_31576* > The bgexec16591_31576.log thread dumps show several blocked Pooled High P= riority Message Processor threads waiting for entries while processing {{In= itialImageOperations}}: > {noformat} > "Pooled High Priority Message Processor 11" #372 daemon prio=3D10 os_prio= =3D0 tid=3D0x00007f609c047000 nid=3D0x581 waiting for monitor entry [0x0000= 7f6090e4f000] > java.lang.Thread.State: BLOCKED (on object monitor) > =09at com.gemstone.gemfire.internal.cache.InitialImageOperation$RequestIm= ageMessage.chunkEntries(InitialImageOperation.java:1857) > =09- waiting to lock <0x00000000f16ce110> (a com.gemstone.gemfire.interna= l.cache.VersionedThinRegionEntryHeapStringKey2) > =09at com.gemstone.gemfire.internal.cache.InitialImageOperation$RequestIm= ageMessage.process(InitialImageOperation.java:1657) > =09at com.gemstone.gemfire.distributed.internal.DistributionMessage.sched= uleAction(DistributionMessage.java:379) > =09at com.gemstone.gemfire.distributed.internal.DistributionMessage$1.run= (DistributionMessage.java:450) > =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto= r.java:1142) > =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecut= or.java:617) > =09at com.gemstone.gemfire.distributed.internal.DistributionManager.runUn= tilShutdown(DistributionManager.java:611) > =09at com.gemstone.gemfire.distributed.internal.DistributionManager$5$1.r= un(DistributionManager.java:922) > =09at java.lang.Thread.run(Thread.java:745) > {noformat} > The P2P message reader that has the entry lock is waiting for replies fro= m dataStoregemfire1_client-13_31558:31558 shown by the log warning and thre= ad below: > {noformat} > [warning 2016/07/03 04:10:03.671 UTC dataStoregemfire1_client-13_31576 :1029 unshared ordered uid=3D416 dom #1 port=3D35558> tid=3D0x150] 15 = seconds have elapsed while waiting for replies: :1025, client-13(dataStoregemf= ire1_client-13_31558:31558):1025]> on client-13(dataStoregemfire1_c= lient-13_31576:31576):1027 whose current membership list is: [[clie= nt-13(31491:locator):1024, client-13(dataStoregemfire1_client-13_31= 558:31558):1025, client-13(dataStoregemfire2_client-13_482:482)= :1026, client-13(dataStoregemfire1_client-13_31576:31576):1027,= client-13(dataStoregemfire1_client-13_31563:31563):1028, client-13= (dataStoregemfire2_client-13_31595:31595):1029, client-13(dataStore= gemfire2_client-13_460:460):1030]] > {noformat} > {noformat} > "P2P message reader for client-13(dataStoregemfire2_client-13_31595:31595= ):1029 unshared ordered uid=3D416 dom #1 port=3D35558" #336 daemon = prio=3D10 os_prio=3D0 tid=3D0x00007f6169731800 nid=3D0x4d5 waiting on condi= tion [0x00007f6093373000] > java.lang.Thread.State: TIMED_WAITING (parking) > =09at sun.misc.Unsafe.park(Native Method) > =09- parking to wait for <0x00000000f16cdfc8> (a java.util.concurrent.Co= untDownLatch$Sync) > =09at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:2= 15) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShar= edNanos(AbstractQueuedSynchronizer.java:1037) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSha= redNanos(AbstractQueuedSynchronizer.java:1328) > =09at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > =09at com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLat= ch.await(StoppableCountDownLatch.java:64) > =09at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.basicWai= t(ReplyProcessor21.java:743) > =09at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForR= epliesUninterruptibly(ReplyProcessor21.java:819) > =09at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForR= epliesUninterruptibly(ReplyProcessor21.java:796) > =09at com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForR= epliesUninterruptibly(ReplyProcessor21.java:886) > =09at com.gemstone.gemfire.internal.cache.DistributedCacheOperation.waitF= orAckIfNeeded(DistributedCacheOperation.java:743) > =09at com.gemstone.gemfire.internal.cache.DistributedCacheOperation.distr= ibute(DistributedCacheOperation.java:622) > =09at com.gemstone.gemfire.internal.cache.AbstractUpdateOperation.distrib= ute(AbstractUpdateOperation.java:71) > =09at com.gemstone.gemfire.internal.cache.BucketRegion.basicPutPart2(Buck= etRegion.java:634) > =09at com.gemstone.gemfire.internal.cache.AbstractRegionMap.basicPut(Abst= ractRegionMap.java:2736) > =09- locked <0x00000000f16ce110> (a com.gemstone.gemfire.internal.cache.V= ersionedThinRegionEntryHeapStringKey2) > =09at com.gemstone.gemfire.internal.cache.BucketRegion.virtualPut(BucketR= egion.java:485) > =09at com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.putL= ocally(PartitionedRegionDataStore.java:1275) > =09at com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.putL= ocally(PartitionedRegionDataStore.java:1250) > =09at com.gemstone.gemfire.internal.cache.PartitionedRegionDataView.putEn= tryOnRemote(PartitionedRegionDataView.java:107) > =09at com.gemstone.gemfire.internal.cache.partitioned.PutMessage.operateO= nPartitionedRegion(PutMessage.java:833) > =09at com.gemstone.gemfire.internal.cache.partitioned.PartitionMessage.pr= ocess(PartitionMessage.java:339) > =09at com.gemstone.gemfire.distributed.internal.DistributionMessage.sched= uleAction(DistributionMessage.java:379) > =09at com.gemstone.gemfire.distributed.internal.DistributionMessage.sched= ule(DistributionMessage.java:442) > =09at com.gemstone.gemfire.distributed.internal.DistributionManager.sched= uleIncomingMessage(DistributionManager.java:3519) > =09at com.gemstone.gemfire.distributed.internal.DistributionManager.handl= eIncomingDMsg(DistributionManager.java:3142) > =09at com.gemstone.gemfire.distributed.internal.DistributionManager$MyLis= tener.messageReceived(DistributionManager.java:4341) > =09at com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMem= bershipManager.dispatchMessage(GMSMembershipManager.java:1100) > =09at com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMem= bershipManager.handleOrDeferMessage(GMSMembershipManager.java:1028) > =09at com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMem= bershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:382) > =09at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.rece= ive(DirectChannel.java:726) > =09at com.gemstone.gemfire.internal.tcp.TCPConduit.messageReceived(TCPCon= duit.java:815) > =09at com.gemstone.gemfire.internal.tcp.Connection.dispatchMessage(Connec= tion.java:3961) > =09at com.gemstone.gemfire.internal.tcp.Connection.processNIOBuffer(Conne= ction.java:3545) > =09at com.gemstone.gemfire.internal.tcp.Connection.runNioReader(Connectio= n.java:1837) > =09at com.gemstone.gemfire.internal.tcp.Connection.run(Connection.java:17= 06) > =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto= r.java:1142) > =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecut= or.java:617) > =09at java.lang.Thread.run(Thread.java:745) > {noformat} > Back in bgexec19832_31558.log, the thread dumps show a number of P2P mess= age reader threads for dataStoregemfire1_client-13_31576:31576 stuck waitin= g for the {{AbstractGatewaySender's lifeCycleLock's readLock}} here: > {noformat} > "P2P message reader for client-13(dataStoregemfire1_client-13_31576:31576= ):1027 unshared ordered uid=3D537 dom #2 port=3D55007" #868 daemon = prio=3D10 os_prio=3D0 tid=3D0x00007f40403a4000 nid=3D0xa23 waiting on condi= tion [0x00007f3fc3442000] > java.lang.Thread.State: WAITING (parking) > =09at sun.misc.Unsafe.park(Native Method) > =09- parking to wait for <0x00000000f1cbe5f8> (a java.util.concurrent.lo= cks.ReentrantReadWriteLock$NonfairSync) > =09at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckI= nterrupt(AbstractQueuedSynchronizer.java:836) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShar= ed(AbstractQueuedSynchronizer.java:967) > =09at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared= (AbstractQueuedSynchronizer.java:1283) > =09at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Ree= ntrantReadWriteLock.java:727) > =09at com.gemstone.gemfire.internal.cache.wan.AbstractGatewaySender.distr= ibute(AbstractGatewaySender.java:928) > =09at com.gemstone.gemfire.internal.cache.LocalRegion.notifyGatewaySender= (LocalRegion.java:6485) > =09at com.gemstone.gemfire.internal.cache.BucketRegion.notifyGatewaySende= r(BucketRegion.java:654) > =09at com.gemstone.gemfire.internal.cache.LocalRegion.basicPutPart2(Local= Region.java:6022) > =09at com.gemstone.gemfire.internal.cache.BucketRegion.basicPutPart2(Buck= etRegion.java:644) > =09at com.gemstone.gemfire.internal.cache.AbstractRegionMap.basicPut(Abst= ractRegionMap.java:2736) > =09- locked <0x00000000f18891f0> (a com.gemstone.gemfire.internal.cache.V= ersionedThinRegionEntryHeapStringKey2) > =09at com.gemstone.gemfire.internal.cache.BucketRegion.virtualPut(BucketR= egion.java:485) > =09at com.gemstone.gemfire.internal.cache.LocalRegionDataView.putEntry(Lo= calRegionDataView.java:132) > =09at com.gemstone.gemfire.internal.cache.LocalRegion.basicUpdate(LocalRe= gion.java:5817) > =09at com.gemstone.gemfire.internal.cache.AbstractUpdateOperation.doPutOr= Create(AbstractUpdateOperation.java:148) > =09at com.gemstone.gemfire.internal.cache.AbstractUpdateOperation$Abstrac= tUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:286) > =09at com.gemstone.gemfire.internal.cache.AbstractUpdateOperation$Abstrac= tUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:255) > =09at com.gemstone.gemfire.internal.cache.DistributedCacheOperation$Cache= OperationMessage.basicProcess(DistributedCacheOperation.java:1191) > =09at com.gemstone.gemfire.internal.cache.DistributedCacheOperation$Cache= OperationMessage.process(DistributedCacheOperation.java:1092) > =09at com.gemstone.gemfire.distributed.internal.DistributionMessage.sched= uleAction(DistributionMessage.java:379) > =09at com.gemstone.gemfire.distributed.internal.DistributionMessage.sched= ule(DistributionMessage.java:442) > =09at com.gemstone.gemfire.distributed.internal.DistributionManager.sched= uleIncomingMessage(DistributionManager.java:3519) > =09at com.gemstone.gemfire.distributed.internal.DistributionManager.handl= eIncomingDMsg(DistributionManager.java:3142) > =09at com.gemstone.gemfire.distributed.internal.DistributionManager$MyLis= tener.messageReceived(DistributionManager.java:4341) > =09at com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMem= bershipManager.dispatchMessage(GMSMembershipManager.java:1100) > =09at com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMem= bershipManager.handleOrDeferMessage(GMSMembershipManager.java:1028) > =09at com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMem= bershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:382) > =09at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.rece= ive(DirectChannel.java:726) > =09at com.gemstone.gemfire.internal.tcp.TCPConduit.messageReceived(TCPCon= duit.java:815) > =09at com.gemstone.gemfire.internal.tcp.Connection.dispatchMessage(Connec= tion.java:3961) > =09at com.gemstone.gemfire.internal.tcp.Connection.processNIOBuffer(Conne= ction.java:3545) > =09at com.gemstone.gemfire.internal.tcp.Connection.runNioReader(Connectio= n.java:1837) > =09at com.gemstone.gemfire.internal.tcp.Connection.run(Connection.java:17= 06) > =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto= r.java:1142) > =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecut= or.java:617) > =09at java.lang.Thread.run(Thread.java:745) > {noformat} > These threads will never get the {{readLock}} since the {{writeLock}} is = blocked. > This deadlock only occurs when the {{AsyncEventQueue}} is persistent, but= its attached data region is not. > The regions being recovered by the {{AsyncEventQueue}} recovery threads a= re the actual data regions. Its the dataStoreRegion that is being GIIed not= the {{AsyncEventQueue}} region: > {noformat} > [info 2016/07/03 04:09:48.504 UTC dataStoregemfire1_client-13_31558 tid=3D0x2c1] Region _B__dataStoreRegion_102 requesting in= itial image from client-13(dataStoregemfire1_client-13_31563:31563)= :1028 > [info 2016/07/03 04:09:48.968 UTC dataStoregemfire1_client-13_31558 tid=3D0x28f] Region _B__dataStoreRegion_54 requesting init= ial image from client-13(dataStoregemfire2_client-13_31595:31595):1= 029 > [info 2016/07/03 04:09:49.007 UTC dataStoregemfire1_client-13_31558 tid=3D0x2be] Region _B__dataStoreRegion_99 requesting init= ial image from client-13(dataStoregemfire1_client-13_31576:31576):1= 027 > [info 2016/07/03 04:09:49.202 UTC dataStoregemfire1_client-13_31558 tid=3D0x2b8] Region _B__dataStoreRegion_93 requesting init= ial image from client-13(dataStoregemfire1_client-13_31576:31576):1= 027 > [info 2016/07/03 04:09:49.206 UTC dataStoregemfire1_client-13_31558 tid=3D0x299] Region _B__dataStoreRegion_64 requesting init= ial image from client-13(dataStoregemfire1_client-13_31576:31576):1= 027 > {noformat} > The code below is from the {{ProxyBucketRegion recoverFromDisk}} method w= hich is executed during recovery of the {{AsyncEventQueue}} bucket. This is= the source of the data region GII: > {noformat} > if(this.partitionedRegion.getDataPolicy().withPersistence() && !colocated= Region.getDataPolicy().withPersistence()) { > =09result =3D colocatedRegion.getDataStore() > =09.grabBucket(bid, getDistributionManager().getDistributionManagerId(),= =20 > =09=09=09true, true, false, null, true); > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)