zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Bowman <ebow...@boboco.ie>
Subject Re: how to handle re-add watch fails
Date Tue, 02 Feb 2010 09:13:00 GMT
Ah ok, that makes a lot of sense.  Thanks Ben!

On 02/01/2010 07:58 PM, Benjamin Reed wrote:
> sadly connectionloss is the really ugly part of zookeeper! it is a
> pain to deal with. i'm not sure we have best practice, but i can tell
> you what i do :) ZOOKEEPER-22 is meant to alleviate this problem.
>
> i usually use the asynch API when handling the watch callback. in the
> completion function if there is a connection loss, i issue another
> async getChildren to retry. this avoids the blocking caller by doing a
> synchronous retry that eric alluded to, but the behavior is
> effectively the same: you retry the request.
>
> you don't need to worry about multiple watches being added colin.
> zookeeper keeps track of which watchers have registered which watches
> and will not register deplicate watches for the same watcher.
> (hopefully you can parse that :)
>
> ben
>
> Colin Goodheart-Smithe wrote:
>> We are having similar problems to this.  At the moment we wrap ZooKeeper
>> in a class which retries requests on KeeperException.ConnectionLoss to
>> avoid no watcher being added, but we are worried that this may result in
>> multiple watchers being added if the watcher is successfully added but
>> the server returns a Connection Loss
>>
>> Colin
>>
>>
>> -----Original Message-----
>> From: Eric Bowman [mailto:ebowman@boboco.ie] Sent: 01 February 2010
>> 10:22
>> To: zookeeper-user@hadoop.apache.org
>> Subject: Re: how to handle re-add watch fails
>>
>> I was surprised to not get a response to this ... is this a
>> no-brainer? Too hard to solve?  Did I not express it clearly?  Am I
>> doing something
>> dumb? :)
>>
>> Thanks,
>> Eric
>>
>> On 01/25/2010 01:05 PM, Eric Bowman wrote:
>>  
>>> I'm curious, what is the "best practice" for how to handle the case
>>> where re-adding a watch inside a Watcher.process callback fails?
>>>
>>> I keep stumbling upon the same kind of thing, and the possibility of
>>> race conditions or undefined behavior keep troubling me.  Maybe I'm
>>> missing something.
>>>
>>> Suppose I have a callback like:
>>>
>>>     public void process( WatchedEvent watchedEvent )
>>>     {
>>>         if ( watchedEvent.getType() ==
>>> Event.EventType.NodeChildrenChanged ) {
>>>             try {
>>>                 ... do stuff ...
>>>             }
>>>             catch ( Throwable e ) {
>>>                 log.error( "Could not do stuff!", e );
>>>             }
>>>             try {
>>>                 zooKeeperManager.watchChildren( zPath, this );
>>>             }
>>>             catch ( InterruptedException e ) {
>>>                 log.info( "Interrupted adding watch -- shutting down?"
>>>     
>> );
>>  
>>>                 return;
>>>             }
>>>             catch ( KeeperException e ) {
>>>                 // oh crap, now what?
>>>             }
>>>         }
>>>     }
>>>
>>> (In this cases, watchChildren is just calling getChildren and passing
>>> the watcher in.)
>>>
>>> It occurs to me I could get more and more complicated here:  I could
>>> wrap watchChildren in a while loop until it succeeds, but that seems
>>> kind of rude to the caller.  Plus what if I get a
>>> KeeperException.SessionExpiredException or a
>>> KeeperException.ConnectionLossException?  How to handle that in this
>>> loop?  Or I could send some other thread a message that it needs to
>>>     
>> keep
>>  
>>> trying until the watch has been re-added ... but ... yuck.
>>>
>>> I would very much like to just setup this watch once, and have
>>>     
>> ZooKeeper
>>  
>>> make sure it keeps firing until I tear down ZooKeeper -- this logic
>>> seems tricky for clients, and quite error prone and full of race
>>>     
>> conditions.
>>  
>>> Any thoughts?
>>>
>>> Thanks,
>>> Eric
>>>
>>>       
>>
>>
>>   
>


-- 
Eric Bowman
Boboco Ltd
ebowman@boboco.ie
http://www.boboco.ie/ebowman/pubkey.pgp
+35318394189/+353872801532


Mime
View raw message