Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 45403 invoked from network); 4 Feb 2010 23:05:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Feb 2010 23:05:04 -0000 Received: (qmail 33954 invoked by uid 500); 4 Feb 2010 23:05:04 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 33919 invoked by uid 500); 4 Feb 2010 23:05:04 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 33905 invoked by uid 99); 4 Feb 2010 23:05:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Feb 2010 23:05:04 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Feb 2010 23:04:53 +0000 Received: from [10.72.168.218] (snvvpn4-10-72-168-c218.hq.corp.yahoo.com [10.72.168.218]) by mrout1.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id o14N4UOI032295; Thu, 4 Feb 2010 15:04:30 -0800 (PST) Message-ID: <4B6B527D.4030609@apache.org> Date: Thu, 04 Feb 2010 15:04:29 -0800 From: Patrick Hunt User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: zookeeper-user@hadoop.apache.org CC: "yonik@lucidimagination.com" Subject: Re: ephemeral node after server bounce References: <4B6B4B89.6070603@yahoo-inc.com> <4B6B508C.4000406@apache.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Ah, excellent idea, won't always work but may help. I think in this case (ephemerals) all Yonik would need to do is close the session. That will remove all ephemerals. Patrick kishore g wrote: > Worst case option would be to have jvm shutdownhooks > http://stackoverflow.com/questions/40376/handle-signals-in-the-java-virtual-machine > > You can delete the znodes on exit. More like deleteOnExit functionality of a > File > > thanks, > Kishore G > > > > On Thu, Feb 4, 2010 at 2:56 PM, Patrick Hunt wrote: > >> hah, you guys beat me to the punch. I think having some unique per client >> token might also work (see my resp). Perhaps this is the ip of the host or >> better (esp if multiple clients on a single host) would be some solr >> specific id that uniquely identifies each node. >> >> Patrick >> >> >> Benjamin Reed wrote: >> >>> i second ted's proposals! thanx ted. >>> >>> there is one other option. when you create the ZooKeeper object you can >>> pass a session id and password. your bounced server can actually reattach to >>> the session. (that is why we put that constructor in.) to use it you need to >>> save the session id and password to a persistent store (a file) when you >>> first attach, and then when you restart read the id and password from the >>> file. >>> >>> ben >>> >>> Ted Dunning wrote: >>> >>>> On Thu, Feb 4, 2010 at 2:20 PM, Yonik Seeley >>>> wrote: >>>> >>>> >>>>> There's no way to "hand over" responsibility for an ephemeral znode, >>>>> right? >>>>> >>>>> >>>>> >>>> Right. >>>> >>>> >>>> >>>> >>>>> We have solr nodes create ephemeral znodes (name based on host and >>>>> port). >>>>> The ephemeral znode takes some time to remove of course, so what >>>>> happens is that if I bounce a solr server (containing a zk client) the >>>>> ephemeral node will still exist when the server comes back up. >>>>> >>>>> >>>>> >>>> This problem comes up with any system that has hysteresis and needs a >>>> single >>>> point of control. >>>> >>>> >>>> >>>> >>>>> What's the best way to handle this situation? Delete and re-create? >>>>> >>>>> >>>>> >>>> Watch it and re-create when it does disappear? >>>> I think you need to handle the problem of multiple search nodes coming >>>> up on >>>> the same machine, possibly because the old one may have hung up. >>>> >>>> So... I would recommend >>>> >>>> a) if the ephemeral still exists, wait for a few more seconds to see if >>>> it >>>> disappears (20?) >>>> >>>> b) if it goes away, create a new one and continue as normal >>>> >>>> c) if it doesn't go away take additional action to determine if service >>>> is >>>> still running (i.e. panic and run in circles). >>>> >>>> >>> >