Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: zookeeper-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of alexismidon@gmail.com
 designates 209.85.161.176 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc:content-type;
        b=h1ZvmLNeXn5fYZljEeE/tN2Hl4jVzXqxQYduML3PFACjA44lF6aR63TnatIDo0FdjF
         gMLbrGo2m+42aigVAJkFd7e1l/Mtd8j9I2MWmhBOnvbMlFrTicyApPxuiLFiVNf1Xe7f
         s5G7CHryHn/9bITWNgajyRusv8YDo87NToq/o=
MIME-Version: 1.0
In-Reply-To: <4C25054D.1040608@apache.org>
References: <AANLkTimJKXPPOMIy3tAa8DO10PqdnaEBm_cSR6MkkpO9@mail.gmail.com>
	<4C25054D.1040608@apache.org>
From: Alexis Midon <alexismidon@gmail.com>
Date: Fri, 25 Jun 2010 14:47:02 -0700
Message-ID: <AANLkTimd-G6GFyHixDMMfxKl3Gi3sgwmNivntjKAqHKx@mail.gmail.com>
Subject: Re: Watchers & error handling
To: Patrick Hunt <phunt@apache.org>
Cc: zookeeper-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0016363b83d862c47a0489e1b7e4

--0016363b83d862c47a0489e1b7e4
Content-Type: text/plain; charset=UTF-8

Hi Patrick,

thanks for your answers. I did some tests yesterday and observed the
following behaviors:

1. Session events i.e. Type-None events are sent to all outstanding watch
handlers. So if you do get(path, watcherX), both the default listener and
watcherX will receive the session events.
 2. Watchers are one-time triggers, however session events do NOT remove a
watcher.
 In other words, if we're listening for NodeCreated event and a
disconnection occurs, we will eventually get notify of a Disconnected, then
a SyncConnected and finally a NodeCreated without having to set any new
watcher.
 3. If the invocation of a (synchronous or asynchronous) method fails, the
watcher is not set. For instance if getChildren("/foo", mywatcher) fails
because the client is disconnected, mywatcher won't be notified of futur
events.

I apologize in advance if I'm stating the obvious but the differences
between "path" events and "session" events were not clear to me.

<http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html#ch_zkWatches>
Alexis

On Fri, Jun 25, 2010 at 12:36 PM, Patrick Hunt <phunt@apache.org> wrote:

>
>
> On 06/12/2010 10:07 PM, Alexis Midon wrote:
>
>> I implemented queues and locks on top of ZooKeeper, and I'm pretty happy
>> so
>> far. Thanks for the nice work. Tests look good. So good that we can focus
>> on
>> exception/error handling and I got a couple of questions.
>>
>> #1. Regarding the use of the default watcher. A ZooKeeper instance has a
>> default watcher, most operations can also specify a watcher. When both are
>> set, does the operation watcher override the default watcher?
>>
>
> if you use the get(path, bool) then the default watcher is notified, if you
> use get(path, watcherX) then only "watcherX" is notified.
>
>
>   or will both watchers be invoked? if so in which order? Does each watcher
>> receive all the types of event?
>>
>
> no, both watchers are not invoked.
>
>
>  I had a look at the code, and my understanding is that the default watcher
>> will always receive the type-NONE events, even if an "operation" watcher
>> is
>> set. No guarantee on the order of invocation though. Could you confirm
>> and/or complete please?
>>
>>
> The watcher gets both state change notifications and watch events. You can
> register multiple watchers for the same path (incl the default), there is no
> guarantee on ordering at all.
>
>
>  #2 After a connection loss, the client will eventually reconnect to the ZK
>> cluster so I guess I can keep using the same client instance. But are
>> there
>>
>
> right
>
>
>  cases where it is necessary to re-instantiate a ZooKeeper client? As a
>> first
>> recovery-strategy, is that ok to always recreate a client so that any
>> ephemeral node previously owned disappear?
>>
>
> if the session is expired that's the case you need to recreate the session
> object (or if you explicitly close).
>
> Yes, this is a fine strategy if your application domain "fits". If you have
> a very expensive "recovery" or "bootstrap" process then recreating the
> session on every disconnect would be a bad idea.
>
>
>  The case I struggle with is the following:
>> Let's say I've acquired a lock (i.e. an ephemeral locknode is created).
>> Some application logic failed due to a connection loss. At this stage I'd
>> like to give up/roll back. Here I would typically throw an exception, the
>> lock being released in a finally. But I can't release the lock since the
>> connection is down. Later the client eventually reconnects, the session
>> didn't expire so the locknode still exists. Now no one else can acquire
>> this
>> lock until my session expires.
>>
>
> Yes, you are reading the situation correctly. In this case you either have
> to take the easy route - close the session and create a new one (again, if
> your app domain supports this) or your client needs to check if the lock is
> still being held (it's still the owner) when it's eventually reconnected.
> You can verify this for an ephemeral node by looking at the "ephemeralOwner"
> field of the Stat object. If this matches your session id then you are the
> owner and still hold the lock. This is a bit tricky to get right though, so
> in some cases clients just close the session and recreate.
>
>
>
>> #3. could you describe the recommended actions for each exception code?
>>
>
> this is highly dependent on your application requirements. See above for my
> general information. ff to ask more questions.
>
> Regards,
>
> Patrick
>

--0016363b83d862c47a0489e1b7e4--