Return-Path: X-Original-To: apmail-curator-dev-archive@minotaur.apache.org Delivered-To: apmail-curator-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 600D510653 for ; Thu, 10 Oct 2013 05:28:22 +0000 (UTC) Received: (qmail 74961 invoked by uid 500); 10 Oct 2013 05:28:19 -0000 Delivered-To: apmail-curator-dev-archive@curator.apache.org Received: (qmail 74932 invoked by uid 500); 10 Oct 2013 05:28:18 -0000 Mailing-List: contact dev-help@curator.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.incubator.apache.org Delivered-To: mailing list dev@curator.incubator.apache.org Received: (qmail 74924 invoked by uid 99); 10 Oct 2013 05:28:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 05:28:17 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 10 Oct 2013 05:28:14 +0000 Received: (qmail 73879 invoked by uid 99); 10 Oct 2013 05:27:42 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 05:27:42 +0000 Date: Thu, 10 Oct 2013 05:27:41 +0000 (UTC) From: "Shaun Senecal (JIRA)" To: dev@curator.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CURATOR-64) Retry logic appears to delay reconnect after session expiry MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CURATOR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791212#comment-13791212 ] Shaun Senecal commented on CURATOR-64: -------------------------------------- I'm still confused. The behaviour we are seeing is that Curator is hanging for several minutes, logging exceptions about failed retry attempts all along the way, before being able to reconnect. Are you saying this is the expected behaviour? I understand that Curator is managing the connection for me, which is why I assume that the retry logic should be able to run in parallel with the reconnect logic so that our service spends as little time as possible disconnected from the cluster. Am I still missing something? > Retry logic appears to delay reconnect after session expiry > ----------------------------------------------------------- > > Key: CURATOR-64 > URL: https://issues.apache.org/jira/browse/CURATOR-64 > Project: Apache Curator > Issue Type: Bug > Components: Framework > Reporter: Shaun Senecal > Attachments: SessionExpiryTest.java > > > If a watch is triggered immediately before a session expiry, and the watch attempts to fetch data from ZK (using Curator), its possible that the reconnect behaviour is delayed until the retry gives up > It currently looks something like this: > 1. watch A is triggered, begins processing > 2. session is expired (watch A hasnt completed execution yet) > 3. watch A attempts to fetch data from ZK (say: curator.getData()...) > 4. the getData() will retry until the policy tells it to give up (could be several minutes) > 5. finally curator will reconnect to ZK > I would expect something more like this: > 1. watch A is triggered, begins processing > 2. session is expired (watch A hasnt completed execution yet) > 3. watch A attempts to fetch data from ZK (say: curator.getData()...) > 4. the first getData() fails because of session expiry (should be nearly instantly) > 5. curator reconnects to ZK > 6. a second attempt to call getData() is made via the RetryPolicy > 7. watch A completes processing > We are using the BoundedExponentialBackoffRetry, so we end up waiting for quite a while after session expiry, leaving our services dead in the water for much longer than is necessary. > This occurs with curator v1.3.3 and ZK 3.4.5 -- This message was sent by Atlassian JIRA (v6.1#6144)