Return-Path: X-Original-To: apmail-geode-issues-archive@minotaur.apache.org Delivered-To: apmail-geode-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 36E851813C for ; Wed, 20 Apr 2016 21:51:29 +0000 (UTC) Received: (qmail 88394 invoked by uid 500); 20 Apr 2016 21:51:29 -0000 Delivered-To: apmail-geode-issues-archive@geode.apache.org Received: (qmail 88366 invoked by uid 500); 20 Apr 2016 21:51:29 -0000 Mailing-List: contact issues-help@geode.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@geode.incubator.apache.org Delivered-To: mailing list issues@geode.incubator.apache.org Received: (qmail 88357 invoked by uid 99); 20 Apr 2016 21:51:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Apr 2016 21:51:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id ABE5518048A for ; Wed, 20 Apr 2016 21:51:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -3.221 X-Spam-Level: X-Spam-Status: No, score=-3.221 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 0RBhpbqJnFzw for ; Wed, 20 Apr 2016 21:51:27 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with SMTP id ABFFA5FB59 for ; Wed, 20 Apr 2016 21:51:26 +0000 (UTC) Received: (qmail 88212 invoked by uid 99); 20 Apr 2016 21:51:26 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Apr 2016 21:51:26 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A3F532C1F60 for ; Wed, 20 Apr 2016 21:51:25 +0000 (UTC) Date: Wed, 20 Apr 2016 21:51:25 +0000 (UTC) From: "ASF subversion and git services (JIRA)" To: issues@geode.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (GEODE-1248) gfsh shutdown command does not shutdown members waiting for missing disk stores MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/GEODE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250785#comment-15250785 ] ASF subversion and git services commented on GEODE-1248: -------------------------------------------------------- Commit ea97a536e9175f36c4bc8d69a89d079649d44f82 in incubator-geode's branch refs/heads/develop from [~jens.deppe] [ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=ea97a53 ] GEODE-1236 GEODE-1248: Fix gfsh sutdown call - This fixes two issues when using gfsh 'shutdown' command - One is that the JVM can exit prematurely because all remaining threads are daemon threads. When coupled with network partition detection this can result in member departed events causing split brain scenarios - [GEODE-1236]. - The other issue is that when a member is starting up it may have synchronized on the CacheFactory class waiting on disk store recovery. This prevented gfsh shutdown to run as it would also try and synchronize on the CacheFactory and be blocked. > gfsh shutdown command does not shutdown members waiting for missing disk stores > ------------------------------------------------------------------------------- > > Key: GEODE-1248 > URL: https://issues.apache.org/jira/browse/GEODE-1248 > Project: Geode > Issue Type: Bug > Components: gfsh > Reporter: Dan Smith > > The gfsh shutdown command fails to shutdown members that are in a state of waiting for another member to recover the latest data. Instead, the shutdown operation gets stuck waiting for a lock on the cache to shutdown the member. > Steps to reproduce. > 1. Start a locator and two members > 2. Create a REPLICATED_PERSISTENT region in gfsh > > create region --name="replicate" --type=REPLICATE_PERSISTEN > 3. Do a put (probably not necessary) > > put --key="a" --value="a" --region=/replicate > 4. shutdown within gfsh > > shutdown --include-locators=false > 5. Start one member. It will get stuck waiting for other members to start. > 6. shutdown within gfsh again. > > shutdown --include-locators=false > 6. List members. You will see that the member is still up. > > list members > The end result after (6) is that the member is still up. In the stack dump, we see the shutdown is blocked on the cache lock. > {noformat} > "Function Execution Processor1" #62 daemon prio=10 os_prio=0 tid=0x00007fe988013800 nid=0xf83a waiting for monitor entry [0x00007fe96e062000] > java.lang.Thread.State: BLOCKED (on object monitor) > at com.gemstone.gemfire.cache.CacheFactory.getAnyInstance(CacheFactory.java:292) > - waiting to lock <0x000000071f13e170> (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory) > at com.gemstone.gemfire.management.internal.cli.functions.ShutDownFunction.execute(ShutDownFunction.java:46) > at com.gemstone.gemfire.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:194) > at com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:379) > at com.gemstone.gemfire.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:450) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at com.gemstone.gemfire.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:655) > at com.gemstone.gemfire.distributed.internal.DistributionManager$9$1.run(DistributionManager.java:1115) > at java.lang.Thread.run(Thread.java:745) > "main" #1 prio=5 os_prio=0 tid=0x00007fea0400a000 nid=0xf7dd in Object.wait() [0x00007fea0afa4000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at com.gemstone.gemfire.internal.cache.persistence.PersistenceAdvisorImpl$MembershipChangeListener.waitForChange(PersistenceAdvisorImpl.java:1144) > - locked <0x000000078b067058> (a com.gemstone.gemfire.internal.cache.persistence.PersistenceAdvisorImpl$MembershipChangeListener) > at com.gemstone.gemfire.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:875) > at com.gemstone.gemfire.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:55) > at com.gemstone.gemfire.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1389) > at com.gemstone.gemfire.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1217) > at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3153) > at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:3047) > at com.gemstone.gemfire.internal.cache.xmlcache.RegionCreation.createRoot(RegionCreation.java:262) > at com.gemstone.gemfire.internal.cache.xmlcache.CacheCreation.initializeRegions(CacheCreation.java:555) > at com.gemstone.gemfire.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:528) > at com.gemstone.gemfire.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:353) > at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4319) > at com.gemstone.gemfire.internal.cache.ClusterConfigurationLoader.applyClusterConfiguration(ClusterConfigurationLoader.java:141) > at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.requestAndApplySharedConfiguration(GemFireCacheImpl.java:1020) > at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1161) > at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreate(GemFireCacheImpl.java:785) > at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:773) > at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:178) > - locked <0x000000071f13e170> (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory) > at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:228) > - locked <0x000000071f13e170> (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory) > at com.gemstone.gemfire.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:55) > at com.gemstone.gemfire.distributed.ServerLauncher.createCache(ServerLauncher.java:806) > at com.gemstone.gemfire.distributed.ServerLauncher.start(ServerLauncher.java:726) > at com.gemstone.gemfire.distributed.ServerLauncher.run(ServerLauncher.java:656) > at com.gemstone.gemfire.distributed.ServerLauncher.main(ServerLauncher.java:207) > {noformat} > The shutdown command needs to somehow trigger shutdown even if the cache is in the state during startup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)