hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Watchers & error handling
Date Fri, 25 Jun 2010 22:33:11 GMT

On 06/25/2010 02:47 PM, Alexis Midon wrote:
> 1. Session events i.e. Type-None events are sent to all outstanding
> watch handlers. So if you do get(path, watcherX), both the default
> listener and watcherX will receive the session events.

That's true. This enables the watcher to handle the case (for example) 
when the client has become disconnected from the cluster. Per operation 
watchers was specifically added to support the "zk library" case - where 
more than a single consumer would be using the client connection. Makes 
it alot easier to add libraries dependent on zk.

>   2. Watchers are one-time triggers, however session events do NOT
> remove a watcher.
>   In other words, if we're listening for NodeCreated event and a
> disconnection occurs, we will eventually get notify of a Disconnected,
> then a SyncConnected and finally a NodeCreated without having to set any
> new watcher.


>   3. If the invocation of a (synchronous or asynchronous) method fails,
> the watcher is not set. For instance if getChildren("/foo", mywatcher)
> fails because the client is disconnected, mywatcher won't be notified of
> futur events.

Correct, a watch is only valid if the operation was successful.

> I apologize in advance if I'm stating the obvious but the differences
> between "path" events and "session" events were not clear to me.

No, this is great. Feel free to enter a JIRA if this is not clear enough.

> <http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html#ch_zkWatches>Alexis

This (3.1.1) is a pretty old version of the docs, I'd suggest that you 
look at the most recent before entering JIRAs:




> On Fri, Jun 25, 2010 at 12:36 PM, Patrick Hunt <phunt@apache.org
> <mailto:phunt@apache.org>> wrote:
>     On 06/12/2010 10:07 PM, Alexis Midon wrote:
>         I implemented queues and locks on top of ZooKeeper, and I'm
>         pretty happy so
>         far. Thanks for the nice work. Tests look good. So good that we
>         can focus on
>         exception/error handling and I got a couple of questions.
>         #1. Regarding the use of the default watcher. A ZooKeeper
>         instance has a
>         default watcher, most operations can also specify a watcher.
>         When both are
>         set, does the operation watcher override the default watcher?
>     if you use the get(path, bool) then the default watcher is notified,
>     if you use get(path, watcherX) then only "watcherX" is notified.
>           or will both watchers be invoked? if so in which order? Does
>         each watcher
>         receive all the types of event?
>     no, both watchers are not invoked.
>         I had a look at the code, and my understanding is that the
>         default watcher
>         will always receive the type-NONE events, even if an "operation"
>         watcher is
>         set. No guarantee on the order of invocation though. Could you
>         confirm
>         and/or complete please?
>     The watcher gets both state change notifications and watch events.
>     You can register multiple watchers for the same path (incl the
>     default), there is no guarantee on ordering at all.
>         #2 After a connection loss, the client will eventually reconnect
>         to the ZK
>         cluster so I guess I can keep using the same client instance.
>         But are there
>     right
>         cases where it is necessary to re-instantiate a ZooKeeper
>         client? As a first
>         recovery-strategy, is that ok to always recreate a client so
>         that any
>         ephemeral node previously owned disappear?
>     if the session is expired that's the case you need to recreate the
>     session object (or if you explicitly close).
>     Yes, this is a fine strategy if your application domain "fits". If
>     you have a very expensive "recovery" or "bootstrap" process then
>     recreating the session on every disconnect would be a bad idea.
>         The case I struggle with is the following:
>         Let's say I've acquired a lock (i.e. an ephemeral locknode is
>         created).
>         Some application logic failed due to a connection loss. At this
>         stage I'd
>         like to give up/roll back. Here I would typically throw an
>         exception, the
>         lock being released in a finally. But I can't release the lock
>         since the
>         connection is down. Later the client eventually reconnects, the
>         session
>         didn't expire so the locknode still exists. Now no one else can
>         acquire this
>         lock until my session expires.
>     Yes, you are reading the situation correctly. In this case you
>     either have to take the easy route - close the session and create a
>     new one (again, if your app domain supports this) or your client
>     needs to check if the lock is still being held (it's still the
>     owner) when it's eventually reconnected. You can verify this for an
>     ephemeral node by looking at the "ephemeralOwner" field of the Stat
>     object. If this matches your session id then you are the owner and
>     still hold the lock. This is a bit tricky to get right though, so in
>     some cases clients just close the session and recreate.
>         #3. could you describe the recommended actions for each
>         exception code?
>     this is highly dependent on your application requirements. See above
>     for my general information. ff to ask more questions.
>     Regards,
>     Patrick

View raw message