Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89076DBAD for ; Wed, 28 Nov 2012 17:55:05 +0000 (UTC) Received: (qmail 81769 invoked by uid 500); 28 Nov 2012 17:55:04 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 81743 invoked by uid 500); 28 Nov 2012 17:55:04 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 81735 invoked by uid 99); 28 Nov 2012 17:55:04 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2012 17:55:04 +0000 Received: from localhost (HELO mail-da0-f42.google.com) (127.0.0.1) (smtp-auth username phunt, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2012 17:55:04 +0000 Received: by mail-da0-f42.google.com with SMTP id z17so5422708dal.15 for ; Wed, 28 Nov 2012 09:55:03 -0800 (PST) Received: by 10.66.78.67 with SMTP id z3mr54165770paw.33.1354125303731; Wed, 28 Nov 2012 09:55:03 -0800 (PST) MIME-Version: 1.0 Received: by 10.66.190.67 with HTTP; Wed, 28 Nov 2012 09:54:43 -0800 (PST) In-Reply-To: <676AAF641B932D40862CD24B8DF47AD81BE22C5D28@TRF-EX-MB01.ad.navteq.com> References: <676AAF641B932D40862CD24B8DF47AD81BE1D7C09A@TRF-EX-MB01.ad.navteq.com> <676AAF641B932D40862CD24B8DF47AD81BE22C5D28@TRF-EX-MB01.ad.navteq.com> From: Patrick Hunt Date: Wed, 28 Nov 2012 09:54:43 -0800 Message-ID: Subject: Re: Unrecoverable ConnectionLossException after server restart To: user@zookeeper.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Are you running a standalone server or an ensemble? Any chance that your datadir is getting cleared btw runs of the server? (for example having data in /tmp and restarting the OS?) Basically this error message is saying that the client has talked to a server that's at version 2, when it reconnects to the server the server is at version 0. I've seen cases where people have seen this before when they clear the datadir when restarting the server. I've also seen cases where the user has an ensemble that's mis-configured - e.g. say 3 servers that are running standalone rather than as a single ensemble. Patrick On Wed, Nov 28, 2012 at 9:17 AM, Carroll James (Nokia-LC/Malvern) wrote: > This is apparently happening because the session establishment is being r= ejected on the server side: > > 2012-11-28 12:13:04,102 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:54551] INFO= ZooKeeperServer - Refusing session request for client /127.0.0.1:38095 as= it has seen zxid 0x2 our last zxid is 0x0 client must try another server > > Unfortunately I can't see any indication on the client side that this is = the problem. The server just decides to sever the connection and the client= just keeps retrying (hence the counting up on the ephemeral ports). I coul= d deal with this in the application if I could tell why the server decided = to close the connection. Is there a way for me to do this? > > Thanks > Jim > > -----Original Message----- > From: Carroll James (Nokia-LC/Malvern) [mailto:james.carroll@nokia.com] > Sent: Wednesday, November 28, 2012 1:19 AM > To: user@zookeeper.apache.org > Subject: Unrecoverable ConnectionLossException after server restart > > I'm seeing (what I think) is incorrect behavior from ZooKeeper. > > When I start a client, connect to a server, and then restart the server, = the client (I thought) was supposed to eventually reconnect. It doesn't. It= continually throws a ConnectionLossException on every use, the ZooKeeper c= lient isAlive is true, I never get a SESSION_EXPIRATION, and I can see the = client side ephemeral ports listed in the error message counting up as if i= t's continually attempting to reconnect. > > If I recreate the ZooKeeper client, the new client connects and I can use= it. > > So I could simply react as if I got a SESSION_EXPIRATION exception and re= build the client state, except the a ConnectionLossException is something I= ALSO get when I get a network partition. When I periodically recreate the = entire client from scratch in response to a ConnectionLossException I event= ually run out of file descriptors and my entire process is hosed. This seem= s to be related to the use of nio and the repeated opening of pipes and ano= n_inodes (which show up in an lsof). > > Am I doing something wrong? Any suggestions? > > The information contained in this communication may be CONFIDENTIAL and i= s intended only for the use of the recipient(s) named above. If you are no= t the intended recipient, you are hereby notified that any dissemination, d= istribution, or copying of this communication, or any of its contents, is s= trictly prohibited. If you have received this communication in error, plea= se notify the sender and delete/destroy the original message and any copy o= f it from your computer or paper files. > > The information contained in this communication may be CONFIDENTIAL and i= s intended only for the use of the recipient(s) named above. If you are no= t the intended recipient, you are hereby notified that any dissemination, d= istribution, or copying of this communication, or any of its contents, is s= trictly prohibited. If you have received this communication in error, plea= se notify the sender and delete/destroy the original message and any copy o= f it from your computer or paper files.