Return-Path: X-Original-To: apmail-curator-dev-archive@minotaur.apache.org Delivered-To: apmail-curator-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EDD8B11956 for ; Tue, 12 Aug 2014 16:43:11 +0000 (UTC) Received: (qmail 28580 invoked by uid 500); 12 Aug 2014 16:43:11 -0000 Delivered-To: apmail-curator-dev-archive@curator.apache.org Received: (qmail 28536 invoked by uid 500); 12 Aug 2014 16:43:11 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 28524 invoked by uid 99); 12 Aug 2014 16:43:11 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Aug 2014 16:43:11 +0000 Date: Tue, 12 Aug 2014 16:43:11 +0000 (UTC) From: "Benjamin Jaton (JIRA)" To: dev@curator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CURATOR-134) Curator sends a connection LOST event before sessionTimeout MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CURATOR-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094292#comment-14094292 ] Benjamin Jaton commented on CURATOR-134: ---------------------------------------- Regardless of the sessionTimeout, with a RetryNTimes(3,10000) retry policy, I think we shouldn't have a LOST event 13 seconds after the last RECONNECTED event, it should be >= 30 seconds. > Curator sends a connection LOST event before sessionTimeout > ----------------------------------------------------------- > > Key: CURATOR-134 > URL: https://issues.apache.org/jira/browse/CURATOR-134 > Project: Apache Curator > Issue Type: Bug > Components: Client > Affects Versions: 2.6.0 > Environment: Ubuntu 12.04 > Reporter: Benjamin Jaton > Priority: Critical > Attachments: Test.java > > > Created a Curator client with: > - connection timeout: 10 seconds > - session timeout: 30 seconds > - retry policy: RetryNTimes(3, 10000) > A scenario where the ensemble is lost produces the the curator client to send a LOST event in less than the expected 30 seconds: > Fri Aug 01 11:17:19 PDT 2014 - CURATOR STATE: SUSPENDED > Fri Aug 01 11:17:29 PDT 2014 - CURATOR STATE: LOST > The client code is attached, this is the complete output: > Fri Aug 01 11:16:53 PDT 2014 - CURATOR STATE: CONNECTED > Fri Aug 01 11:16:54 PDT 2014 - Creating ZK client... > Fri Aug 01 11:16:54 PDT 2014 - ZK client created... > Fri Aug 01 11:16:54 PDT 2014 - ZOOKEEPER STATE: SyncConnected > Fri Aug 01 11:16:58 PDT 2014 - ZOOKEEPER STATE: Disconnected > Fri Aug 01 11:16:58 PDT 2014 - CURATOR STATE: SUSPENDED > Fri Aug 01 11:17:16 PDT 2014 - CURATOR STATE: RECONNECTED > Fri Aug 01 11:17:17 PDT 2014 - ZOOKEEPER STATE: SyncConnected > Fri Aug 01 11:17:19 PDT 2014 - ZOOKEEPER STATE: Disconnected > Fri Aug 01 11:17:19 PDT 2014 - CURATOR STATE: SUSPENDED > Fri Aug 01 11:17:29 PDT 2014 - CURATOR STATE: LOST > I think that the LOST event is actually 30 seconds away from the very first SUSPENDED event, whereas is should be 30 seconds away from the last one. > To reproduce it, I started only 2 ZK servers in a 3 nodes ensembles, then I stopped one of them (-> 1st SUSPENDED), waited for 10-20 seconds, then started it and stopped it again. -- This message was sent by Atlassian JIRA (v6.2#6252)