Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD936DC23 for ; Wed, 28 Nov 2012 18:03:41 +0000 (UTC) Received: (qmail 24127 invoked by uid 500); 28 Nov 2012 18:03:41 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 24095 invoked by uid 500); 28 Nov 2012 18:03:41 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 24086 invoked by uid 99); 28 Nov 2012 18:03:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2012 18:03:41 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of james.carroll@nokia.com designates 147.243.128.24 as permitted sender) Received: from [147.243.128.24] (HELO mgw-da01.nokia.com) (147.243.128.24) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2012 18:03:34 +0000 Received: from imailfargo.navteq.com ([10.228.255.13]) by mgw-da01.nokia.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qASI31rH022690 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 28 Nov 2012 20:03:03 +0200 Received: from TRF-EX-HT.ad.navteq.com (trf-ex-ht.ad.navteq.com [10.19.74.58]) by imailfargo.navteq.com (8.13.6/8.13.6) with ESMTP id qASHw4G5004467 for ; Wed, 28 Nov 2012 11:58:04 -0600 Received: from TRF-EX-MB01.ad.navteq.com ([fe80::64ac:eb4f:edf7:ae8f]) by TRF-EX-HT.ad.navteq.com ([fe80::1f4:1b59:a8bc:4bd3%11]) with mapi; Wed, 28 Nov 2012 13:03:01 -0500 From: "Carroll James (Nokia-LC/Malvern)" To: "user@zookeeper.apache.org" Date: Wed, 28 Nov 2012 13:03:00 -0500 Subject: RE: Unrecoverable ConnectionLossException after server restart Thread-Topic: Unrecoverable ConnectionLossException after server restart Thread-Index: AQHNzTBAz5OpwluF3kqvGTh4CyYqd5f/fYnggAAJ9uA= Message-ID: <676AAF641B932D40862CD24B8DF47AD81BE22C5D3C@TRF-EX-MB01.ad.navteq.com> References: <676AAF641B932D40862CD24B8DF47AD81BE1D7C09A@TRF-EX-MB01.ad.navteq.com> <676AAF641B932D40862CD24B8DF47AD81BE22C5D28@TRF-EX-MB01.ad.navteq.com> In-Reply-To: <676AAF641B932D40862CD24B8DF47AD81BE22C5D28@TRF-EX-MB01.ad.navteq.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Nokia-AV: Clean X-Virus-Checked: Checked by ClamAV on apache.org Ok. So the only difference between a network partition failure and a zookee= per server cluster bounce that I can see from the client side is that in fo= rmer case the ConnectionLossException happens on a ZooKeeper client where t= he state is CONNECTED and in the later it's CONNECTING. Is this a reliable = means of determining I should recreate the client state from scratch? -----Original Message----- From: Carroll James (Nokia-LC/Malvern) [mailto:james.carroll@nokia.com] Sent: Wednesday, November 28, 2012 12:18 PM To: user@zookeeper.apache.org Subject: RE: Unrecoverable ConnectionLossException after server restart This is apparently happening because the session establishment is being rej= ected on the server side: 2012-11-28 12:13:04,102 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:54551] INFO = ZooKeeperServer - Refusing session request for client /127.0.0.1:38095 as i= t has seen zxid 0x2 our last zxid is 0x0 client must try another server Unfortunately I can't see any indication on the client side that this is th= e problem. The server just decides to sever the connection and the client j= ust keeps retrying (hence the counting up on the ephemeral ports). I could = deal with this in the application if I could tell why the server decided to= close the connection. Is there a way for me to do this? Thanks Jim -----Original Message----- From: Carroll James (Nokia-LC/Malvern) [mailto:james.carroll@nokia.com] Sent: Wednesday, November 28, 2012 1:19 AM To: user@zookeeper.apache.org Subject: Unrecoverable ConnectionLossException after server restart I'm seeing (what I think) is incorrect behavior from ZooKeeper. When I start a client, connect to a server, and then restart the server, th= e client (I thought) was supposed to eventually reconnect. It doesn't. It c= ontinually throws a ConnectionLossException on every use, the ZooKeeper cli= ent isAlive is true, I never get a SESSION_EXPIRATION, and I can see the cl= ient side ephemeral ports listed in the error message counting up as if it'= s continually attempting to reconnect. If I recreate the ZooKeeper client, the new client connects and I can use i= t. So I could simply react as if I got a SESSION_EXPIRATION exception and rebu= ild the client state, except the a ConnectionLossException is something I A= LSO get when I get a network partition. When I periodically recreate the en= tire client from scratch in response to a ConnectionLossException I eventua= lly run out of file descriptors and my entire process is hosed. This seems = to be related to the use of nio and the repeated opening of pipes and anon_= inodes (which show up in an lsof). Am I doing something wrong? Any suggestions? The information contained in this communication may be CONFIDENTIAL and is = intended only for the use of the recipient(s) named above. If you are not = the intended recipient, you are hereby notified that any dissemination, dis= tribution, or copying of this communication, or any of its contents, is str= ictly prohibited. If you have received this communication in error, please= notify the sender and delete/destroy the original message and any copy of = it from your computer or paper files. The information contained in this communication may be CONFIDENTIAL and is = intended only for the use of the recipient(s) named above. If you are not = the intended recipient, you are hereby notified that any dissemination, dis= tribution, or copying of this communication, or any of its contents, is str= ictly prohibited. If you have received this communication in error, please= notify the sender and delete/destroy the original message and any copy of = it from your computer or paper files. The information contained in this communication may be CONFIDENTIAL and is = intended only for the use of the recipient(s) named above. If you are not = the intended recipient, you are hereby notified that any dissemination, dis= tribution, or copying of this communication, or any of its contents, is str= ictly prohibited. If you have received this communication in error, please= notify the sender and delete/destroy the original message and any copy of = it from your computer or paper files.