curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Rankin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CURATOR-439) CuratorFrameworkState STARTED, but ZookeeperClient not connected
Date Wed, 25 Oct 2017 08:46:03 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218259#comment-16218259
] 

Alex Rankin commented on CURATOR-439:
-------------------------------------

>From analysing the log files, it looks like the ConnectionState fluctuated between SUSPENDED
and RECONNECTED a few times, and was LOST twice. The first time the connection was LOST, it
RECONNECTED again afterwards. After the second time, there were no more ConnectionState changes.

It isn't clear from the documentation, but are we expected to close and restart the Curator
instance if the ConnectionState is LOST? After looking through some other public codebases,
it seems that this is the approach that others take.

> CuratorFrameworkState STARTED, but ZookeeperClient not connected
> ----------------------------------------------------------------
>
>                 Key: CURATOR-439
>                 URL: https://issues.apache.org/jira/browse/CURATOR-439
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 3.2.1
>            Reporter: Alex Rankin
>            Priority: Minor
>
> I recently ran into an issue on some of our nodes caused by network issues between a
service and Zookeeper. I have been unable to recreate them as of yet, but I'm still trying.
> *+Setup+*
> 5x services using Curator 3.2.1 to talk to Zookeeper 3.5.3 cluster (also 5 nodes).
> Network issues caused the services to disconnect from Zookeeper. 
> There's a check in our code to see if the Zookeeper connection is available before sending
a request:
> {quote}public boolean isConnected() \{
>     return curatorFramework.getZookeeperClient().isConnected();
> \}
> {quote}
> After the network issues resolved, we noticed that all calls to Zookeeper from 4 of the
services were still failing (the fifth was fine). Checking the logs, we saw that {{CuratorFramework.getState()}}
was reporting the state as STARTED, but {{curatorFramework.getZookeeperClient().isConnected();}}
was returning false. Restarting the service fixed everything, but I want to obviously avoid
this issue in future.
> *+Problem+*
> I couldn't find any documentation stating whether the {{CuratorZookeeperClient.isConnected()}}
should be used, or if {{CuratorFramework.getState() == CuratorFrameworkState.STARTED}} (the
functionality of the deprecated {{CuratorFramework.isConnected()}}) would be the better check,
or if these should both be equivalent, and there's a bug that let one be true while the other
was false.
> If my own check is wrong, and I shouldn't be using {{CuratorZookeeperClient.isConnected()}},
then I can easily fix that. I wanted to check the expected behaviour before diving too deep
into this, in case this is normal and I am just using Curator incorrectly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message