Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BA48F7C19 for ; Mon, 29 Aug 2011 18:12:42 +0000 (UTC) Received: (qmail 72609 invoked by uid 500); 29 Aug 2011 18:12:42 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 72524 invoked by uid 500); 29 Aug 2011 18:12:41 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 72515 invoked by uid 99); 29 Aug 2011 18:12:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Aug 2011 18:12:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.213.42] (HELO mail-yw0-f42.google.com) (209.85.213.42) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Aug 2011 18:12:33 +0000 Received: by ywb3 with SMTP id 3so4746943ywb.15 for ; Mon, 29 Aug 2011 11:12:12 -0700 (PDT) Received: by 10.42.159.196 with SMTP id m4mr4808835icx.407.1314641530800; Mon, 29 Aug 2011 11:12:10 -0700 (PDT) Received: from [10.10.10.154] (host1.hortonworks.com [70.35.59.2]) by mx.google.com with ESMTPS id u1sm5728744icj.4.2011.08.29.11.12.05 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 29 Aug 2011 11:12:06 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: zk keeps disconnecting and reconnecting From: Mahadev Konar In-Reply-To: <69D3016305F9084FBD2C4A0DF189BD5C1772539992@GSCMAMP02EX.firmwide.corp.gs.com> Date: Mon, 29 Aug 2011 11:10:11 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <5C5AF86D-D155-46C2-A157-48D7B4DC1CD0@hortonworks.com> References: <69D3016305F9084FBD2C4A0DF189BD5C1772539931@GSCMAMP02EX.firmwide.corp.gs.com> <201108291654.19951.thomas@koch.ro> <69D3016305F9084FBD2C4A0DF189BD5C1772539992@GSCMAMP02EX.firmwide.corp.gs.com> To: user@zookeeper.apache.org X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org Camille, Do you think we should put the fix in 3.3.4? I think 3.4 might take a = while to stabilize, so 3.3.4 would be a good release to get this in. Thoughts? mahadev On Aug 29, 2011, at 10:50 AM, Fournier, Camille F. wrote: > Well, it causes the problem you are seeing. If you set any watchers = with a chroot and then your client gets disconnected with these watches = outstanding, when you reconnect you will try to reset them and they are = probably on paths that don't exist (if you are creating everything under = path /kafka-tracking). So you get a notification about the watches = immediately after resetting them, which causes the string out of bounds = exception.=20 >=20 > The only fix is to disable auto watch reset, and then have your own = client reset watches when it gets a reconnected event. I suspect it = would be easier for you to take a shot at fixing the bug than to rewrite = your client to handle this. Thomas provided a patch with tests that = presumably show the error, so all you need is a fix to make them pass. >=20 >=20 > C >=20 > -----Original Message----- > From: Jun Rao [mailto:junrao@gmail.com]=20 > Sent: Monday, August 29, 2011 12:39 PM > To: user@zookeeper.apache.org; thomas@koch.ro > Subject: Re: zk keeps disconnecting and reconnecting >=20 > What's the impact of ZOOKEEPER-961? If it shows up, does that mean the > client won't get any watcher events afterwards? If so, this sounds = like a > blocker for 3.4 release to me. What's the temporary solution for = 3.3.3? >=20 > Also, for the very first time that the ZK client gets disconnected, I = saw > the following entry in the log. It seems that the client can't ping = the > server for 4 seconds. The ZK server was up at that time and the load = was > minimal. What could cause the time out? Client GC pauses? >=20 > 2011/08/26 10:58:22.306 INFO [ClientCnxn] > [main-SendThread(esv4-app27.stg:12913)] [kafka] Client session timed = out, > have not heard from server in 4001ms for sessionid 0x131f > ddd84bc0006, closing socket connection and attempting reconnect >=20 > Thanks, >=20 > Jun >=20 > On Mon, Aug 29, 2011 at 7:54 AM, Thomas Koch wrote: >=20 >> Fournier, Camille F.: >>> Did anyone ever check resetting watches at client reconnect on a = client >>> with a chroot? Looking at the code, we store the watches associated = with >>> the non-chroot path, but they are set by the original request = prepending >>> chroot to the request. However, it looks like the SetWatches request = on >>> reconnect just calls get on the various watch lists from ZooKeeper, = which >>> don't have the prepended chroot. >>>=20 >>> I haven't written a test but I would bet dollars to donuts this is = the >>> problem. >>>=20 >>> C >> seems to be this: >> ZOOKEEPER-961, ZOOKEEPER-1091 >>=20 >> Regards, >>=20 >> Thomas Koch, http://www.koch.ro >>=20