curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julio Lopez (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CURATOR-15) LeaderSelector may (undetectably) fail to elect
Date Mon, 24 Jun 2013 21:32:21 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692407#comment-13692407
] 

Julio Lopez edited comment on CURATOR-15 at 6/24/13 9:32 PM:
-------------------------------------------------------------

Here is an occurrence, caused by UnknownHostException.  Perhaps, either LeaderSelector or
InterProcessMutex should handle these cases and retry.

{{E 06-23 03:30:07.106 LeaderSelector-0 c.n.c.f.r.l.LeaderSelector:349 |::] mutex.acquire()
threw an exception
java.net.UnknownHostException: xyz.example.com
        at java.net.InetAddress.getAllByName0(Unknown Source) ~[...]
        at java.net.InetAddress.getAllByName(Unknown Source) ~[...]
        at java.net.InetAddress.getAllByName(Unknown Source) ~[...]
        at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60)
~[...]
        at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) ~[...]
        at com.netflix.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:27)
~[...]
        at com.netflix.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:166)
~[...]
        at com.netflix.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94) ~[...]
        at com.netflix.curator.HandleHolder.getZooKeeper(HandleHolder.java:55) ~[...]
        at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:112) ~[...]
        at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:107)
~[...]
        at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:448)
~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:625)
~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:609)
~[...]
        at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106) ~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:605)
~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:428)
~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:41)
~[...]
        at com.netflix.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:218)
~[...]
        at com.netflix.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:218)
~[...]
        at com.netflix.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:74)
~[...]
        at com.netflix.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:313)
[...]
        at com.netflix.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:374)
[...]
        at com.netflix.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:45)
[...]
        at com.netflix.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:194)
[...]
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) [na:1.6.0_32]
        at java.util.concurrent.FutureTask.run(Unknown Source) [na:1.6.0_32]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) [na:1.6.0_32]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.6.0_32]}}
                
      was (Author: juliolopez):
    Here is an occurrence, caused by UnknownHostException.  Perhaps, either LeaderSelector
or InterProcessMutex should handle these cases and retry.

{{
E 06-23 03:30:07.106 LeaderSelector-0 c.n.c.f.r.l.LeaderSelector:349 |::] mutex.acquire()
threw an exception
java.net.UnknownHostException: xyz.example.com
        at java.net.InetAddress.getAllByName0(Unknown Source) ~[...]
        at java.net.InetAddress.getAllByName(Unknown Source) ~[...]
        at java.net.InetAddress.getAllByName(Unknown Source) ~[...]
        at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60)
~[...]
        at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) ~[...]
        at com.netflix.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:27)
~[...]
        at com.netflix.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:166)
~[...]
        at com.netflix.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94) ~[...]
        at com.netflix.curator.HandleHolder.getZooKeeper(HandleHolder.java:55) ~[...]
        at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:112) ~[...]
        at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:107)
~[...]
        at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:448)
~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:625)
~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:609)
~[...]
        at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106) ~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:605)
~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:428)
~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:41)
~[...]
        at com.netflix.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:218)
~[...]
        at com.netflix.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:218)
~[...]
        at com.netflix.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:74)
~[...]
        at com.netflix.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:313)
[...]
        at com.netflix.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:374)
[...]
        at com.netflix.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:45)
[...]
        at com.netflix.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:194)
[...]
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) [na:1.6.0_32]
        at java.util.concurrent.FutureTask.run(Unknown Source) [na:1.6.0_32]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) [na:1.6.0_32]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.6.0_32]
}}
                  
> LeaderSelector may (undetectably) fail to elect
> -----------------------------------------------
>
>                 Key: CURATOR-15
>                 URL: https://issues.apache.org/jira/browse/CURATOR-15
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.0.0-incubating
>            Reporter: Shevek
>             Fix For: TBD
>
>
> In LeaderSelector, if mutex.acquire() throws an Exception, for example because CuratorFramework.getZooKeeper()
threw a previously-enqueued background exception, then that failure will propagate out of
doWork and doWorkLoop, and kill the background submission onto the executor service.
> This means that a leaderselector which was start()ed will NEVER elect, and this situation
is NOT DETECTABLE externally, since that exception happens on a private executorservice thread
and is not client visible. It's impossible to look at a LeaderSelector and decide whether
it is still "viable".
> This can leave a machine/process "hung" and not automatically recoverable within curator.
> Either isQueued() needs to be exposed, which means that a leader is either elected or
queued; or the finally{} block which calls clearIsQueued() needs also to set state to CLOSED
or FAILED, so that we can query this failure externally.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message