curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CURATOR-358) Receiving KeeperException with NoNode when LeaderLatch#getLeader()
Date Mon, 21 Nov 2016 06:51:58 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682714#comment-15682714
] 

ASF GitHub Bot commented on CURATOR-358:
----------------------------------------

Github user dragonsinth commented on a diff in the pull request:

    https://github.com/apache/curator/pull/173#discussion_r88837901
  
    --- Diff: curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderSelector.java
---
    @@ -341,11 +342,41 @@ public Participant getLeader() throws Exception
     
         static Participant getLeader(CuratorFramework client, Collection<String> participantNodes)
throws Exception
         {
    +        Participant result = null;
    +        
             if ( participantNodes.size() > 0 )
             {
    -            return participantForPath(client, participantNodes.iterator().next(), true);
    +            Iterator<String> iter = participantNodes.iterator();
    +            while ( iter.hasNext() )
    +            {
    +                
    +                try
    +                {
    +                    result = participantForPath(client, iter.next(), true);
    --- End diff --
    
    Hehe, I wasn't worried about efficiency, it was a code clarity thought.


> Receiving KeeperException with NoNode when LeaderLatch#getLeader()
> ------------------------------------------------------------------
>
>                 Key: CURATOR-358
>                 URL: https://issues.apache.org/jira/browse/CURATOR-358
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.10.0
>            Reporter: Satish Duggana
>            Priority: Critical
>
> org.apache.curator.framework.recipes.leader.LeaderLatch#getLeader() throws KeeperException
with Code#NONODE intermittently as mentioned in the stack trace below. It may be possible
 participant's ephemeral ZK node is removed because its connection/session is closed. 
> You can see the below code at https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L451
> public Participant getLeader() throws Exception
> {
>     Collection<String> participantNodes = LockInternals.getParticipantNodes(client,
latchPath, LOCK_NAME, sorter);
>     return LeaderSelector.getLeader(client, participantNodes);
> }
> I guess it hits a race condition where a participant node is retrieved but when it invokes
LeaderSelector#getLeader() it would have been removed because of session timeout and it throws
KeeperException with NoNode code. It does not retry as the RetryLoop retries only for connection/session
timeouts. But in this case, NoNode should have been retried. I could not find any APIs on
CuratorClient to configure the kind of KeeperException codes to be retried. It may be good
to have a way to take what kind of errors should be retried in org.apache.curator.framework.CuratorFrameworkFactory.Builder
APIs. 
> Intermittent Exception found with the stack trace:
> 2016-11-15 06:09:33.954 o.a.s.d.nimbus [ERROR] Error when processing event
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
= NoNode for /storm/leader-lock/_c_97c09eed-5bba-4ac8-a05f-abdc4e8e95cf-latch-0000000002
>      at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>      at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>      at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>      at org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:304)
>      at org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:293)
>      at org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>      at org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:290)
>      at org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:281)
>      at org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:42)
>      at org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderSelector.participantForPath(LeaderSelector.java:375)
>      at org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderSelector.getLeader(LeaderSelector.java:346)
>      at org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:454)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message