Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 94968 invoked from network); 17 Feb 2011 21:30:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Feb 2011 21:30:03 -0000 Received: (qmail 97570 invoked by uid 500); 17 Feb 2011 21:30:02 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 97522 invoked by uid 500); 17 Feb 2011 21:30:02 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 97514 invoked by uid 99); 17 Feb 2011 21:30:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Feb 2011 21:30:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.9] (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 17 Feb 2011 21:30:00 +0000 Received: (qmail 94895 invoked by uid 99); 17 Feb 2011 21:29:38 -0000 Received: from localhost.apache.org (HELO mail-ww0-f46.google.com) (127.0.0.1) (smtp-auth username mahadev, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Feb 2011 21:29:38 +0000 Received: by wwj40 with SMTP id 40so3041603wwj.15 for ; Thu, 17 Feb 2011 13:29:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.169.1 with SMTP id m1mr1959759wel.112.1297978176194; Thu, 17 Feb 2011 13:29:36 -0800 (PST) Received: by 10.216.50.134 with HTTP; Thu, 17 Feb 2011 13:29:36 -0800 (PST) In-Reply-To: References: Date: Thu, 17 Feb 2011 13:29:36 -0800 Message-ID: Subject: Re: ephemeral node problem From: Mahadev Konar To: user@zookeeper.apache.org Cc: Samuel Rash Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org You probably ran into https://issues.apache.org/jira/browse/ZOOKEEPER-919 ? If yes, 3.3.3 has the fix! It is due to be released next week. thanks mahadev On Thu, Feb 17, 2011 at 1:08 PM, Samuel Rash wrote: > Hi, > > We are running zookeeper 3.3.2 and have seen what appears to be a problem= with ephemeral nodes. =A0We create about 2000 persistent nodes (leaves) in= a hierarchy. =A0Under each of these, we run a leader > election with ephemeral nodes (~40). =A0This results in about 80,000 tota= l ephemeral nodes. =A0During restart of our system, the leader elections ca= n churn a bit as hosts remove themselves, electing new leaders, which then = themselves may withdraw from the election. =A0In one such restart, we saw a= n election get 'stuck'. =A0Upon investigating, one node had it's session ex= pired (indicated in the zk logs), but one of its ephemeral nodes was still = left. =A0We took down the processes holdilng ephemeral nodes and this node = remained. > > Are there any known bugs in zookeeper that might result in this? =A0it do= es not appear to happen under our normal laod. > > thx, > -sr > > > > Sam Rash > rash@fb.com >