Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 91933 invoked from network); 25 Jun 2010 21:47:52 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 25 Jun 2010 21:47:52 -0000 Received: (qmail 80738 invoked by uid 500); 25 Jun 2010 21:47:52 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 80704 invoked by uid 500); 25 Jun 2010 21:47:51 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 80695 invoked by uid 99); 25 Jun 2010 21:47:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Jun 2010 21:47:51 +0000 X-ASF-Spam-Status: No, hits=1.1 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of alexismidon@gmail.com designates 209.85.161.176 as permitted sender) Received: from [209.85.161.176] (HELO mail-gx0-f176.google.com) (209.85.161.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Jun 2010 21:47:45 +0000 Received: by gxk7 with SMTP id 7so1914115gxk.35 for ; Fri, 25 Jun 2010 14:47:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type; bh=++ZzL128B2W5OZ9uoSGJRclQ5Apn+ObX7q8krDQZR58=; b=YZKE2KAJMmLVNRbSb34TMACX2Xd0KFWFnf3npTkffGPKVFElfRiaXJrMollVKA0pN5 NHTi3QolZ9v8At1P77532UzETzQOx/TdNZRcxUJ9/v8wMoISVPOMTu8dwczWDYiLe9ot S18NDKTCAFgW4efhzTocJEQhykkdVUJONhip8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; b=h1ZvmLNeXn5fYZljEeE/tN2Hl4jVzXqxQYduML3PFACjA44lF6aR63TnatIDo0FdjF gMLbrGo2m+42aigVAJkFd7e1l/Mtd8j9I2MWmhBOnvbMlFrTicyApPxuiLFiVNf1Xe7f s5G7CHryHn/9bITWNgajyRusv8YDo87NToq/o= Received: by 10.229.228.137 with SMTP id je9mr893229qcb.97.1277502444189; Fri, 25 Jun 2010 14:47:24 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.13.137 with HTTP; Fri, 25 Jun 2010 14:47:02 -0700 (PDT) In-Reply-To: <4C25054D.1040608@apache.org> References: <4C25054D.1040608@apache.org> From: Alexis Midon Date: Fri, 25 Jun 2010 14:47:02 -0700 Message-ID: Subject: Re: Watchers & error handling To: Patrick Hunt Cc: zookeeper-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016363b83d862c47a0489e1b7e4 --0016363b83d862c47a0489e1b7e4 Content-Type: text/plain; charset=UTF-8 Hi Patrick, thanks for your answers. I did some tests yesterday and observed the following behaviors: 1. Session events i.e. Type-None events are sent to all outstanding watch handlers. So if you do get(path, watcherX), both the default listener and watcherX will receive the session events. 2. Watchers are one-time triggers, however session events do NOT remove a watcher. In other words, if we're listening for NodeCreated event and a disconnection occurs, we will eventually get notify of a Disconnected, then a SyncConnected and finally a NodeCreated without having to set any new watcher. 3. If the invocation of a (synchronous or asynchronous) method fails, the watcher is not set. For instance if getChildren("/foo", mywatcher) fails because the client is disconnected, mywatcher won't be notified of futur events. I apologize in advance if I'm stating the obvious but the differences between "path" events and "session" events were not clear to me. Alexis On Fri, Jun 25, 2010 at 12:36 PM, Patrick Hunt wrote: > > > On 06/12/2010 10:07 PM, Alexis Midon wrote: > >> I implemented queues and locks on top of ZooKeeper, and I'm pretty happy >> so >> far. Thanks for the nice work. Tests look good. So good that we can focus >> on >> exception/error handling and I got a couple of questions. >> >> #1. Regarding the use of the default watcher. A ZooKeeper instance has a >> default watcher, most operations can also specify a watcher. When both are >> set, does the operation watcher override the default watcher? >> > > if you use the get(path, bool) then the default watcher is notified, if you > use get(path, watcherX) then only "watcherX" is notified. > > > or will both watchers be invoked? if so in which order? Does each watcher >> receive all the types of event? >> > > no, both watchers are not invoked. > > > I had a look at the code, and my understanding is that the default watcher >> will always receive the type-NONE events, even if an "operation" watcher >> is >> set. No guarantee on the order of invocation though. Could you confirm >> and/or complete please? >> >> > The watcher gets both state change notifications and watch events. You can > register multiple watchers for the same path (incl the default), there is no > guarantee on ordering at all. > > > #2 After a connection loss, the client will eventually reconnect to the ZK >> cluster so I guess I can keep using the same client instance. But are >> there >> > > right > > > cases where it is necessary to re-instantiate a ZooKeeper client? As a >> first >> recovery-strategy, is that ok to always recreate a client so that any >> ephemeral node previously owned disappear? >> > > if the session is expired that's the case you need to recreate the session > object (or if you explicitly close). > > Yes, this is a fine strategy if your application domain "fits". If you have > a very expensive "recovery" or "bootstrap" process then recreating the > session on every disconnect would be a bad idea. > > > The case I struggle with is the following: >> Let's say I've acquired a lock (i.e. an ephemeral locknode is created). >> Some application logic failed due to a connection loss. At this stage I'd >> like to give up/roll back. Here I would typically throw an exception, the >> lock being released in a finally. But I can't release the lock since the >> connection is down. Later the client eventually reconnects, the session >> didn't expire so the locknode still exists. Now no one else can acquire >> this >> lock until my session expires. >> > > Yes, you are reading the situation correctly. In this case you either have > to take the easy route - close the session and create a new one (again, if > your app domain supports this) or your client needs to check if the lock is > still being held (it's still the owner) when it's eventually reconnected. > You can verify this for an ephemeral node by looking at the "ephemeralOwner" > field of the Stat object. If this matches your session id then you are the > owner and still hold the lock. This is a bit tricky to get right though, so > in some cases clients just close the session and recreate. > > > >> #3. could you describe the recommended actions for each exception code? >> > > this is highly dependent on your application requirements. See above for my > general information. ff to ask more questions. > > Regards, > > Patrick > --0016363b83d862c47a0489e1b7e4--