Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0D42D200D27 for ; Wed, 25 Oct 2017 10:46:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0B869160BF2; Wed, 25 Oct 2017 08:46:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 504761609E5 for ; Wed, 25 Oct 2017 10:46:09 +0200 (CEST) Received: (qmail 58722 invoked by uid 500); 25 Oct 2017 08:46:08 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 58711 invoked by uid 99); 25 Oct 2017 08:46:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Oct 2017 08:46:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 937CEC58BC for ; Wed, 25 Oct 2017 08:46:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id aX2Jmyn3e9eE for ; Wed, 25 Oct 2017 08:46:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 8A8E85F477 for ; Wed, 25 Oct 2017 08:46:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 04BB2E0D4E for ; Wed, 25 Oct 2017 08:46:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A9C8A212F9 for ; Wed, 25 Oct 2017 08:46:03 +0000 (UTC) Date: Wed, 25 Oct 2017 08:46:03 +0000 (UTC) From: "Alex Rankin (JIRA)" To: dev@curator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CURATOR-439) CuratorFrameworkState STARTED, but ZookeeperClient not connected MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 25 Oct 2017 08:46:10 -0000 [ https://issues.apache.org/jira/browse/CURATOR-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218259#comment-16218259 ] Alex Rankin commented on CURATOR-439: ------------------------------------- From analysing the log files, it looks like the ConnectionState fluctuated between SUSPENDED and RECONNECTED a few times, and was LOST twice. The first time the connection was LOST, it RECONNECTED again afterwards. After the second time, there were no more ConnectionState changes. It isn't clear from the documentation, but are we expected to close and restart the Curator instance if the ConnectionState is LOST? After looking through some other public codebases, it seems that this is the approach that others take. > CuratorFrameworkState STARTED, but ZookeeperClient not connected > ---------------------------------------------------------------- > > Key: CURATOR-439 > URL: https://issues.apache.org/jira/browse/CURATOR-439 > Project: Apache Curator > Issue Type: Bug > Components: Framework > Affects Versions: 3.2.1 > Reporter: Alex Rankin > Priority: Minor > > I recently ran into an issue on some of our nodes caused by network issues between a service and Zookeeper. I have been unable to recreate them as of yet, but I'm still trying. > *+Setup+* > 5x services using Curator 3.2.1 to talk to Zookeeper 3.5.3 cluster (also 5 nodes). > Network issues caused the services to disconnect from Zookeeper. > There's a check in our code to see if the Zookeeper connection is available before sending a request: > {quote}public boolean isConnected() \{ > return curatorFramework.getZookeeperClient().isConnected(); > \} > {quote} > After the network issues resolved, we noticed that all calls to Zookeeper from 4 of the services were still failing (the fifth was fine). Checking the logs, we saw that {{CuratorFramework.getState()}} was reporting the state as STARTED, but {{curatorFramework.getZookeeperClient().isConnected();}} was returning false. Restarting the service fixed everything, but I want to obviously avoid this issue in future. > *+Problem+* > I couldn't find any documentation stating whether the {{CuratorZookeeperClient.isConnected()}} should be used, or if {{CuratorFramework.getState() == CuratorFrameworkState.STARTED}} (the functionality of the deprecated {{CuratorFramework.isConnected()}}) would be the better check, or if these should both be equivalent, and there's a bug that let one be true while the other was false. > If my own check is wrong, and I shouldn't be using {{CuratorZookeeperClient.isConnected()}}, then I can easily fix that. I wanted to check the expected behaviour before diving too deep into this, in case this is normal and I am just using Curator incorrectly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)