curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jordan Zimmerman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CURATOR-466) LeaderSelector gets in an inconsistent state when releasing resources.
Date Tue, 13 Nov 2018 15:54:00 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685401#comment-16685401
] 

Jordan Zimmerman commented on CURATOR-466:
------------------------------------------

Calling {{leaderSelector.close() }}will cause the LeaderSelectorListener to get called in
a separate thread. However, you are immediately closing the Curator handle after closing the
leader selector therefore you're getting the error you're seeing. So, this is expected.

 
{quote}do we really need to manually clean up on shutdown? 
{quote}
The reason for closing the leader selector is to immediately delete the ZNode so that any
waiting leaders can become leader. If you don't close the leader selector you'd have to wait
for the session to timeout. However, in the case above closing the Curator handle (assuming
it succeeds) will close the ZK handle and thus end the session. So, it's really up to you
how to handle cleanup. But, closing the Curator Handle will quickly delete any ephemeral
ZNodes.

> LeaderSelector gets in an inconsistent state when releasing resources.
> ----------------------------------------------------------------------
>
>                 Key: CURATOR-466
>                 URL: https://issues.apache.org/jira/browse/CURATOR-466
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 4.0.1
>            Reporter: Mikhail Pryakhin
>            Priority: Major
>
> I'm using the leader election recipe that works well until I encountered application
shutdown.
> here is my example:
>  
> {code:java}
> CuratorFramework framework = CuratorFrameworkFactory.builder()
>     .connectString("localhost:2181")
>     .retryPolicy(new RetryOneTime(100))
>     .build();
> LeaderSelector leaderSelector = new LeaderSelector(
>     framework,
>     "/path",
>     new LeaderSelectorListener() {
>         volatile boolean stopped;
>         @Override
>         public void takeLeadership(CuratorFramework client) throws Exception {
>             System.out.println("I'm a new leader!");
>             try {
>                 while (!Thread.currentThread().isInterrupted() && !stopped) {
>                     TimeUnit.SECONDS.sleep(1);
>                 }
>             } finally {
>                 System.out.println("I'm not a leader anymore..");
>             }
>         }
>         @Override
>         public void stateChanged(CuratorFramework client, ConnectionState     newState)
{
>             if (client.getConnectionStateErrorPolicy().isErrorState(newState)) {
>                 stopped = true;
>             }
>          }
>   }
> );
> framework.start();
> leaderSelector.start();
> TimeUnit.SECONDS.sleep(5);
> leaderSelector.close();   //(1)
> framework.close();        //(2){code}
>  
> When I release resources by calling close method first on the LeaderSelector instance
and then on the CurtorFramework instance (lines 1 and 2) I always get the following exception:
>  
> {code:java}
> java.lang.IllegalStateException: instance must be started before calling this method
> at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
~[curator-client-4.0.1.jar:?]
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.delete(CuratorFrameworkImpl.java:424)
~[curator-framework-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347)
~[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124)
~[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154)
~[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449)
[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466)
[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65)
[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246)
[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240)
[curator-recipes-4.0.1.jar:4.0.1]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_141]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
> {code}
>  
> The reason for the exception is that the non-blocking LeaderSelector.close method delegates
call to the internal executor service, which abruptly cancels the running futures with the
interptIfRunning flag set to true. Right after this, the CuratorFramework close method is
called. By the meantime, the future being canceled executes the finally block where it calls
methods on the already closed CuratorFramework instance which leads to throwing an exception.
> I thought I can wait a bit until the LeaderSelector instance is closed, so I tried to
delay for some time before closing the CuratorFramework instance, but doing so leads to another
exception:
> {code:java}
> ava.lang.InterruptedException: null
> at java.lang.Object.wait(Native Method) ~[?:1.8.0_141]
> at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_141]
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409) ~[zookeeper-3.4.12.jar:3.4.12--1]
> at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:874) ~[zookeeper-3.4.12.jar:3.4.12--1]
> at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:274)
~[curator-framework-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:268)
~[curator-framework-4.0.1.jar:4.0.1]
> at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
~[curator-client-4.0.1.jar:?]
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100) ~[curator-client-4.0.1.jar:?]
> at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:265)
~[curator-framework-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:249)
~[curator-framework-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:34)
~[curator-framework-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347)
~[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124)
~[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154)
~[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449)
[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466)
[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65)
[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246)
[curator-recipes-4.0.1.jar:4.0.1]
> at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240)
[curator-recipes-4.0.1.jar:4.0.1]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_141]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
> {code}
> At this time the exception is caused by the future being canceled with the interptIfRunning
flag set to true in the LeaderSelector close method.
> As the LeaderSelector implementation is based on the InterPorcessMutex that works with
ephemeral nodes, do we really need to manually clean up on shutdown? As far as I know, the
ephemeral nodes are deleted when the client disconnects.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message