curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antal Sasvári (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CURATOR-45) LeaderSelector threw exception, but still created ephemeral node, breaking everything
Date Tue, 08 Oct 2013 10:52:41 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789093#comment-13789093
] 

Antal Sasvári commented on CURATOR-45:
--------------------------------------

Was this patch also tested with autoRequeue enabled?

I have changed TestLeaderSelectorEdges.flappingTest() to enable autoRequeue() for leaderSelector1,
and it seems that more and more ephemeral nodes keep getting created and the deleted (with
increasing sequence numbers), and leaderSelector1 is getting and loosing leadership all the
time.

It looks like the new LE ephemeral node would be constantly deleted in the background, and
then recreated again because of autoRequeue.


> LeaderSelector threw exception, but still created ephemeral node, breaking everything
> -------------------------------------------------------------------------------------
>
>                 Key: CURATOR-45
>                 URL: https://issues.apache.org/jira/browse/CURATOR-45
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework, Recipes
>    Affects Versions: 2.2.0-incubating
>            Reporter: Shevek
>            Assignee: Jordan Zimmerman
>             Fix For: 2.3.0
>
>         Attachments: CURATOR-45.patch
>
>
> ZooKeeper hiccupped, and then this happened:
>     2013-06-19 02:23:35,561 DEBUG [LeaderSelector-1] com.netflix.curator.RetryLoop.takeException
(RetryLoop.java:184) - Retry-able exception received
>     org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /[REMOVED]/election/_c_1ccdb2b9-7f9a-4570-9555-201c91ec2dcb-lock-
>             at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.5.0.jar:3.5.0--1]
>             at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.5.0.jar:3.5.0--1]
>             at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:876) ~[zookeeper-3.5.0.jar:3.5.0--1]
>             at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:625)
~[curator-framework-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:609)
~[curator-framework-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106) [curator-client-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:605)
[curator-framework-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:428)
[curator-framework-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:41)
[curator-framework-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:218)
[curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:218)
[curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:74)
[curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:314)
[curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:373)
[curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:46)
[curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:195)
[curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [?:1.6.0_27]
>             at java.util.concurrent.FutureTask.run(FutureTask.java:166) [?:1.6.0_27]
>             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
[?:1.6.0_27]
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[?:1.6.0_27]
>             at java.lang.Thread.run(Thread.java:679) [?:1.6.0_27]
> However, the ephemeral node got created, and this hung leader election for this path.
> I'm investigating to work out where to put an extra guaranteed-delete. I see the case
in LockInternals, which sometimes triggers to do this cleanup, but it didn't trigger in this
case.
> You must really love our bugs by now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message