Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7B1DE1721E for ; Thu, 23 Apr 2015 17:35:44 +0000 (UTC) Received: (qmail 72164 invoked by uid 500); 23 Apr 2015 17:35:44 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 72112 invoked by uid 500); 23 Apr 2015 17:35:43 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 72100 invoked by uid 99); 23 Apr 2015 17:35:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Apr 2015 17:35:43 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: message received from 54.191.145.13 which is an MX secondary for user@zookeeper.apache.org) Received: from [54.191.145.13] (HELO mx1-us-west.apache.org) (54.191.145.13) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Apr 2015 17:35:37 +0000 Received: from mail-ie0-f178.google.com (mail-ie0-f178.google.com [209.85.223.178]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 75DCB2143E for ; Thu, 23 Apr 2015 17:35:17 +0000 (UTC) Received: by iedfl3 with SMTP id fl3so74453088ied.1 for ; Thu, 23 Apr 2015 10:34:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=YI5BlC4Y2/szqsJoM1IsY+fE2wcAsEyCq5DkMzwYx18=; b=JEFJcPtEyXl7iWM0IQMg1nKzGGaYyK2JqEexnUFwotbmzwQJz2elAzoUvITMc+QE18 KOHNw7tZQHKUm54tfnrY58hqIQjERUaXy+ULOkV/63CKiLSYT5Shg+EdSvOH3dce2A7j wv9tejVbMV1fvOJVlgWsDaU2MbgptdcLdVU8qSB+DePhBwv5lRw+U0MpgxsMGhrfLdWW kfga22s3lIJ/TBTItPdvlgC0eg0lYtXat/S0dBNqKMWG8S0BN2YpzzsbImeFAmD/3ktj CJEfb7i7iV8/Fbstdtpo72OO9Wk0gPRXKuSnKsLslOByxC8yv2BfsUnhB1DGxvybAnjI blrw== MIME-Version: 1.0 X-Received: by 10.50.126.105 with SMTP id mx9mr173996igb.21.1429810481911; Thu, 23 Apr 2015 10:34:41 -0700 (PDT) Received: by 10.50.190.137 with HTTP; Thu, 23 Apr 2015 10:34:41 -0700 (PDT) In-Reply-To: References: Date: Thu, 23 Apr 2015 13:34:41 -0400 Message-ID: Subject: Re: Zookeeper-based discovery provider: infinite re-connect loop after server restart From: Yuriy Lopotun To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=047d7b4145d4dede21051467b1d7 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b4145d4dede21051467b1d7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Looks like there's an opened bug for the described issue: https://issues.apache.org/jira/browse/ZOOKEEPER-832 There was some discussion in the comments but looks like the best solution hasn't been found yet. Yuriy 2015-04-22 18:55 GMT-04:00 Yuriy Lopotun : > Hi guys, > > > > In our client-server OSGI application we are using ECF Zookeeper-based > discovery provider for remote services discovery (based on Zookeeper > v.3.3.6). > > In a standalone mode the plugin opens a dedicated Zookeeper connection > from the client to each of the servers. > > > When testing the application resiliency, we noticed that when we restart > the server, the connection never gets re-established. In the server logs = I > found the following: > > 2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO > org.apac.zook.serv.NIOServerCnxn - Accepted socket connection from / > 10.36.64.250:53022 > > 2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] DEBUG > org.apac.zook.serv.NIOServerCnxn - Session establishment request from > client /10.36.64.250:53022 client's lastZxid is 0x8 > > 2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO > org.apac.zook.serv.NIOServerCnxn - Refusing session request for client / > 10.36.64.250:53022 as it has seen zxid 0x8 our last zxid is 0x7 client > must try another server > > 2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO > org.apac.zook.serv.NIOServerCnxn - Closed socket connection for client / > 10.36.64.250:53022 (no session established for client) > > > > As far as I understood =E2=80=93 this is an expected behaviour, since the= server > (due to restart) cleaned up its DB and reset the transaction id. > > > The problem in this case is that the client session keeps trying > re-connecting to this only server, which causes an infinite loop: > > 2015-04-22 18:21:02,760 [pool-2-thread-3-SendThread( > ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - > Opening socket connection to server > ca-rd-mbernard.miranda.com/10.36.64.250:2001 > > 2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread( > ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - Socket > connection established to ca-rd-mbernard.miranda.com/10.36.64.250:2001, > initiating session > > 2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread( > ca-rd-mbernard.miranda.com:2001)] DEBUG org.apac.zook.ClientCnxn - > Session establishment request sent on > ca-rd-mbernard.miranda.com/10.36.64.250:2001 > > 2015-04-22 18:21:02,762 [pool-2-thread-3-SendThread( > ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - Unable > to read additional data from server sessionid 0x14ce32e178c0002, likely > server has closed socket, closing socket connection and attempting reconn= ect > > > > Again, I think this is a correct behaviour in case of several servers. Bu= t > in our case =E2=80=93 it=E2=80=99s always 1. > > So, I wanted to ask you for a suggestion: what you think we can do in thi= s > case to achieve automatic reconnect. > > I thought, maybe we can close the connection in case of such exception if > there is only 1 server instead of retrying? Maybe this enhancement is > already done in more recent versions and could be back-ported? > > > > Thanks, > > Yuriy > --047d7b4145d4dede21051467b1d7--