Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 80495 invoked from network); 1 Feb 2010 19:58:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Feb 2010 19:58:56 -0000 Received: (qmail 27845 invoked by uid 500); 1 Feb 2010 19:58:56 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 27808 invoked by uid 500); 1 Feb 2010 19:58:56 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 27798 invoked by uid 99); 1 Feb 2010 19:58:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Feb 2010 19:58:56 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Feb 2010 19:58:45 +0000 Received: from [10.73.73.170] (wifi-73-170.greatamerica.corp.yahoo.com [10.73.73.170]) by mrout1.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id o11JwClr058568 for ; Mon, 1 Feb 2010 11:58:13 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:subject: references:in-reply-to:content-type:content-transfer-encoding; b=hsmMOKI+tEZ3cI9akDWaYX1HQQ7KDkuhdkSK4kNWET/ZC5yYbbSunh2Gbf9ZSEEJ Message-ID: <4B673253.6050204@yahoo-inc.com> Date: Mon, 01 Feb 2010 11:58:11 -0800 From: Benjamin Reed User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: "zookeeper-user@hadoop.apache.org" Subject: Re: how to handle re-add watch fails References: <8163028120305742991D2FB7F19412ABB0123E@uksrpblkexb01.detica.com> In-Reply-To: <8163028120305742991D2FB7F19412ABB0123E@uksrpblkexb01.detica.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org sadly connectionloss is the really ugly part of zookeeper! it is a pain to deal with. i'm not sure we have best practice, but i can tell you what i do :) ZOOKEEPER-22 is meant to alleviate this problem. i usually use the asynch API when handling the watch callback. in the completion function if there is a connection loss, i issue another async getChildren to retry. this avoids the blocking caller by doing a synchronous retry that eric alluded to, but the behavior is effectively the same: you retry the request. you don't need to worry about multiple watches being added colin. zookeeper keeps track of which watchers have registered which watches and will not register deplicate watches for the same watcher. (hopefully you can parse that :) ben Colin Goodheart-Smithe wrote: > We are having similar problems to this. At the moment we wrap ZooKeeper > in a class which retries requests on KeeperException.ConnectionLoss to > avoid no watcher being added, but we are worried that this may result in > multiple watchers being added if the watcher is successfully added but > the server returns a Connection Loss > > Colin > > > -----Original Message----- > From: Eric Bowman [mailto:ebowman@boboco.ie] > Sent: 01 February 2010 10:22 > To: zookeeper-user@hadoop.apache.org > Subject: Re: how to handle re-add watch fails > > I was surprised to not get a response to this ... is this a no-brainer? > Too hard to solve? Did I not express it clearly? Am I doing something > dumb? :) > > Thanks, > Eric > > On 01/25/2010 01:05 PM, Eric Bowman wrote: > >> I'm curious, what is the "best practice" for how to handle the case >> where re-adding a watch inside a Watcher.process callback fails? >> >> I keep stumbling upon the same kind of thing, and the possibility of >> race conditions or undefined behavior keep troubling me. Maybe I'm >> missing something. >> >> Suppose I have a callback like: >> >> public void process( WatchedEvent watchedEvent ) >> { >> if ( watchedEvent.getType() == >> Event.EventType.NodeChildrenChanged ) { >> try { >> ... do stuff ... >> } >> catch ( Throwable e ) { >> log.error( "Could not do stuff!", e ); >> } >> try { >> zooKeeperManager.watchChildren( zPath, this ); >> } >> catch ( InterruptedException e ) { >> log.info( "Interrupted adding watch -- shutting down?" >> > ); > >> return; >> } >> catch ( KeeperException e ) { >> // oh crap, now what? >> } >> } >> } >> >> (In this cases, watchChildren is just calling getChildren and passing >> the watcher in.) >> >> It occurs to me I could get more and more complicated here: I could >> wrap watchChildren in a while loop until it succeeds, but that seems >> kind of rude to the caller. Plus what if I get a >> KeeperException.SessionExpiredException or a >> KeeperException.ConnectionLossException? How to handle that in this >> loop? Or I could send some other thread a message that it needs to >> > keep > >> trying until the watch has been re-added ... but ... yuck. >> >> I would very much like to just setup this watch once, and have >> > ZooKeeper > >> make sure it keeps firing until I tear down ZooKeeper -- this logic >> seems tricky for clients, and quite error prone and full of race >> > conditions. > >> Any thoughts? >> >> Thanks, >> Eric >> >> >> > > >