Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 603EA200B31 for ; Tue, 24 May 2016 22:41:48 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5ED50160A35; Tue, 24 May 2016 20:41:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8245D16098E for ; Tue, 24 May 2016 22:41:47 +0200 (CEST) Received: (qmail 32318 invoked by uid 500); 24 May 2016 20:41:46 -0000 Mailing-List: contact user-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@curator.apache.org Delivered-To: mailing list user@curator.apache.org Received: (qmail 32307 invoked by uid 99); 24 May 2016 20:41:46 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 May 2016 20:41:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 28D63C689D for ; Tue, 24 May 2016 20:41:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.847 X-Spam-Level: X-Spam-Status: No, score=-1.847 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=emc.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 0TsJlfxa5nC2 for ; Tue, 24 May 2016 20:41:44 +0000 (UTC) Received: from mailuogwhop.emc.com (mailuogwhop.emc.com [168.159.213.141]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 1151C60D36 for ; Tue, 24 May 2016 20:41:43 +0000 (UTC) Received: from maildlpprd02.lss.emc.com (maildlpprd02.lss.emc.com [10.253.24.34]) by mailuogwprd04.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id u4OKfgZc024929 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Tue, 24 May 2016 16:41:42 -0400 X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd04.lss.emc.com u4OKfgZc024929 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=emc.com; s=jan2013; t=1464122502; bh=45dmDy2FOuBXIUtwVqbVaz4jQ5E=; h=From:To:Subject:Date:Message-ID:References:Content-Type: MIME-Version; b=KkENOHyrIkojzAJeTfJGUnE0FiJbMgbVE7UZ+vBQfKeVHwd2fFI7PKE2RLfCqmKCU 5m0lj1P7UI9Z06U+pw3TU5TFDCyfbl3xKMDZ3p9QKSn7lIzfW3WKfr7zIezKuQH3Hy f+okCqqq3jf43TWBN+Vzw/EgJFfVtXGrKGFl6qNE= X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd04.lss.emc.com u4OKfgZc024929 Received: from mailusrhubprd03.lss.emc.com (mailusrhubprd03.lss.emc.com [10.253.24.21]) by maildlpprd02.lss.emc.com (RSA Interceptor) for ; Tue, 24 May 2016 16:41:19 -0400 Received: from MXHUB107.corp.emc.com (MXHUB107.corp.emc.com [10.253.50.23]) by mailusrhubprd03.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id u4OKfLOG022089 (version=TLSv1 cipher=AES128-SHA bits=128 verify=FAIL) for ; Tue, 24 May 2016 16:41:21 -0400 Received: from MX201CL02.corp.emc.com ([fe80::4014:4247:6135:9e7a]) by MXHUB107.corp.emc.com ([10.253.50.23]) with mapi id 14.03.0266.001; Tue, 24 May 2016 16:41:21 -0400 From: "Wang, Simon" To: "user@curator.apache.org" Subject: RE: Connection lost handling when entering double barrier Thread-Topic: Connection lost handling when entering double barrier Thread-Index: AdG1M9AdmA0jOAw3QX2niLj0eFBRjAAA9hUgAAAVNZAAMSPt8A== Date: Tue, 24 May 2016 20:41:20 +0000 Message-ID: <13B4BA540188A7448A1829D8EF06BDE60A399ED4@MX201CL02.corp.emc.com> References: <13B4BA540188A7448A1829D8EF06BDE60A399B20@MX201CL02.corp.emc.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.238.54.128] Content-Type: multipart/alternative; boundary="_000_13B4BA540188A7448A1829D8EF06BDE60A399ED4MX201CL02corpem_" MIME-Version: 1.0 X-Sentrion-Hostname: mailusrhubprd03.lss.emc.com X-RSA-Classifications: public archived-at: Tue, 24 May 2016 20:41:48 -0000 --_000_13B4BA540188A7448A1829D8EF06BDE60A399ED4MX201CL02corpem_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I also opened an jira CURATOR-330 on this. From: Wang, Simon Sent: Monday, May 23, 2016 5:14 PM To: user@curator.apache.org Subject: Connection lost handling when entering double barrier Here is the problem I'm meeting: Assuming 3 node ensemble, my application has 3 clients and each one runs on= same zk node (Client 1, 2 and 3). They use double barrier for coordination= . Client 1 is entering the barrier and waiting for the other 2. Now the other= 2 nodes are down and then the ensemble gets crashed and the client 1 gets = LostConnectionException from enter(). That's expected. After while the other 2 nodes come back, all clients need to retry operati= on and reenter the same barrier (It might become more complex if creating a= new barrier). Here is the problem: If the session for client 1 is still alive, Client 1 calling enter method w= ill get NodeExistException as the ephemeral node corresponding to that sess= ion is not deleted yet. I wonder in this case what should I do from application side? Or I'm thinki= ng can we add a mechanism to reenter the barrier but skip creating child no= de for this client if that exists? I would like to open a Jira for this if required. Thanks, Simon --_000_13B4BA540188A7448A1829D8EF06BDE60A399ED4MX201CL02corpem_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

I also opened an jira = CURATOR-330 on this.

 

From: Wang, Si= mon
Sent: Monday, May 23, 2016 5:14 PM
To: user@curator.apache.org
Subject: Connection lost handling when entering double barrier<= /o:p>

 

Here is the problem I’m meeting:

 

Assuming 3 node ensemble, my application has 3 clien= ts and each one runs on same zk node (Client 1, 2 and 3). They use double b= arrier for coordination.

 

Client 1 is entering the barrier and waiting for the= other 2. Now the other 2 nodes are down and then the ensemble gets crashed= and the client 1 gets LostConnectionException from enter(). That’s e= xpected.

 

After while the other 2 nodes come back,  all c= lients need to retry operation and reenter the same barrier (It might becom= e more complex if creating a new barrier). Here is the problem:<= /p>

 

If the session for client 1 is still alive, Client 1= calling enter method will get NodeExistException as the ephemeral node cor= responding to that session is not deleted yet.

 

I wonder in this case what should I do from applicat= ion side? Or I’m thinking can we add a mechanism to reenter the barri= er but skip creating child node for this client if that exists?<= /p>

 

I would like to open a Jira for this if required.

 

Thanks,

Simon

--_000_13B4BA540188A7448A1829D8EF06BDE60A399ED4MX201CL02corpem_--