Mailing-List: contact issues-help@geode.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@geode.incubator.apache.org
Date: Wed, 20 Apr 2016 21:51:25 +0000 (UTC)
From: "ASF subversion and git services (JIRA)" <jira@apache.org>
To: issues@geode.incubator.apache.org
Message-ID: <JIRA.12959791.1461014266000.283660.1461189085669@Atlassian.JIRA>
In-Reply-To: <JIRA.12959791.1461014266000@Atlassian.JIRA>
References: <JIRA.12959791.1461014266000@Atlassian.JIRA>
 <JIRA.12959791.1461014266988@arcas>
Subject: [jira] [Commented] (GEODE-1248) gfsh shutdown command does not
 shutdown members waiting for missing disk stores
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/GEODE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250785#comment-15250785 ] 

ASF subversion and git services commented on GEODE-1248:
--------------------------------------------------------

Commit ea97a536e9175f36c4bc8d69a89d079649d44f82 in incubator-geode's branch refs/heads/develop from [~jens.deppe]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=ea97a53 ]

GEODE-1236 GEODE-1248: Fix gfsh sutdown call

- This fixes two issues when using gfsh 'shutdown' command
- One is that the JVM can exit prematurely because all remaining threads
  are daemon threads. When coupled with network partition detection this
  can result in member departed events causing split brain scenarios -
  [GEODE-1236].
- The other issue is that when a member is starting up it may have
  synchronized on the CacheFactory class waiting on disk store recovery.
  This prevented gfsh shutdown to run as it would also try and
  synchronize on the CacheFactory and be blocked.


> gfsh shutdown command does not shutdown members waiting for missing disk stores
> -------------------------------------------------------------------------------
>
>                 Key: GEODE-1248
>                 URL: https://issues.apache.org/jira/browse/GEODE-1248
>             Project: Geode
>          Issue Type: Bug
>          Components: gfsh
>            Reporter: Dan Smith
>
> The gfsh shutdown command fails to shutdown members that are in a state of waiting for another member to recover the latest data. Instead, the shutdown operation gets stuck waiting for a lock on the cache to shutdown the member.
> Steps to reproduce.
> 1. Start a locator and two members
> 2. Create a REPLICATED_PERSISTENT region in gfsh
> > create region --name="replicate" --type=REPLICATE_PERSISTEN
> 3. Do a put (probably not necessary)
> > put --key="a" --value="a" --region=/replicate
> 4. shutdown within gfsh
> > shutdown --include-locators=false
> 5. Start one member. It will get stuck waiting for other members to start.
> 6. shutdown within gfsh again.
> > shutdown --include-locators=false
> 6. List members. You will see that the member is still up.
> > list members
> The end result after (6) is that the member is still up. In the stack dump, we see the shutdown is blocked on the cache lock.
> {noformat}
> "Function Execution Processor1" #62 daemon prio=10 os_prio=0 tid=0x00007fe988013800 nid=0xf83a waiting for monitor entry [0x00007fe96e062000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at com.gemstone.gemfire.cache.CacheFactory.getAnyInstance(CacheFactory.java:292)
>     - waiting to lock <0x000000071f13e170> (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory)
>     at com.gemstone.gemfire.management.internal.cli.functions.ShutDownFunction.execute(ShutDownFunction.java:46)
>     at com.gemstone.gemfire.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:194)
>     at com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:379)
>     at com.gemstone.gemfire.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:450)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at com.gemstone.gemfire.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:655)
>     at com.gemstone.gemfire.distributed.internal.DistributionManager$9$1.run(DistributionManager.java:1115)
>     at java.lang.Thread.run(Thread.java:745)
> "main" #1 prio=5 os_prio=0 tid=0x00007fea0400a000 nid=0xf7dd in Object.wait() [0x00007fea0afa4000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>     at java.lang.Object.wait(Native Method)
>     at com.gemstone.gemfire.internal.cache.persistence.PersistenceAdvisorImpl$MembershipChangeListener.waitForChange(PersistenceAdvisorImpl.java:1144)
>     - locked <0x000000078b067058> (a com.gemstone.gemfire.internal.cache.persistence.PersistenceAdvisorImpl$MembershipChangeListener)
>     at com.gemstone.gemfire.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:875)
>     at com.gemstone.gemfire.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:55)
>     at com.gemstone.gemfire.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1389)
>     at com.gemstone.gemfire.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1217)
>     at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3153)
>     at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:3047)
>     at com.gemstone.gemfire.internal.cache.xmlcache.RegionCreation.createRoot(RegionCreation.java:262)
>     at com.gemstone.gemfire.internal.cache.xmlcache.CacheCreation.initializeRegions(CacheCreation.java:555)
>     at com.gemstone.gemfire.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:528)
>     at com.gemstone.gemfire.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:353)
>     at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4319)
>     at com.gemstone.gemfire.internal.cache.ClusterConfigurationLoader.applyClusterConfiguration(ClusterConfigurationLoader.java:141)
>     at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.requestAndApplySharedConfiguration(GemFireCacheImpl.java:1020)
>     at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1161)
>     at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreate(GemFireCacheImpl.java:785)
>     at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:773)
>     at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:178)
>     - locked <0x000000071f13e170> (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory)
>     at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:228)
>     - locked <0x000000071f13e170> (a java.lang.Class for com.gemstone.gemfire.cache.CacheFactory)
>     at com.gemstone.gemfire.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:55)
>     at com.gemstone.gemfire.distributed.ServerLauncher.createCache(ServerLauncher.java:806)
>     at com.gemstone.gemfire.distributed.ServerLauncher.start(ServerLauncher.java:726)
>     at com.gemstone.gemfire.distributed.ServerLauncher.run(ServerLauncher.java:656)
>     at com.gemstone.gemfire.distributed.ServerLauncher.main(ServerLauncher.java:207)
> {noformat}
> The shutdown command needs to somehow trigger shutdown even if  the cache is in the state during startup.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)