Ah ok, that makes a lot of sense. Thanks Ben!
On 02/01/2010 07:58 PM, Benjamin Reed wrote:
> sadly connectionloss is the really ugly part of zookeeper! it is a
> pain to deal with. i'm not sure we have best practice, but i can tell
> you what i do :) ZOOKEEPER-22 is meant to alleviate this problem.
>
> i usually use the asynch API when handling the watch callback. in the
> completion function if there is a connection loss, i issue another
> async getChildren to retry. this avoids the blocking caller by doing a
> synchronous retry that eric alluded to, but the behavior is
> effectively the same: you retry the request.
>
> you don't need to worry about multiple watches being added colin.
> zookeeper keeps track of which watchers have registered which watches
> and will not register deplicate watches for the same watcher.
> (hopefully you can parse that :)
>
> ben
>
> Colin Goodheart-Smithe wrote:
>> We are having similar problems to this. At the moment we wrap ZooKeeper
>> in a class which retries requests on KeeperException.ConnectionLoss to
>> avoid no watcher being added, but we are worried that this may result in
>> multiple watchers being added if the watcher is successfully added but
>> the server returns a Connection Loss
>>
>> Colin
>>
>>
>> -----Original Message-----
>> From: Eric Bowman [mailto:ebowman@boboco.ie] Sent: 01 February 2010
>> 10:22
>> To: zookeeper-user@hadoop.apache.org
>> Subject: Re: how to handle re-add watch fails
>>
>> I was surprised to not get a response to this ... is this a
>> no-brainer? Too hard to solve? Did I not express it clearly? Am I
>> doing something
>> dumb? :)
>>
>> Thanks,
>> Eric
>>
>> On 01/25/2010 01:05 PM, Eric Bowman wrote:
>>
>>> I'm curious, what is the "best practice" for how to handle the case
>>> where re-adding a watch inside a Watcher.process callback fails?
>>>
>>> I keep stumbling upon the same kind of thing, and the possibility of
>>> race conditions or undefined behavior keep troubling me. Maybe I'm
>>> missing something.
>>>
>>> Suppose I have a callback like:
>>>
>>> public void process( WatchedEvent watchedEvent )
>>> {
>>> if ( watchedEvent.getType() ==
>>> Event.EventType.NodeChildrenChanged ) {
>>> try {
>>> ... do stuff ...
>>> }
>>> catch ( Throwable e ) {
>>> log.error( "Could not do stuff!", e );
>>> }
>>> try {
>>> zooKeeperManager.watchChildren( zPath, this );
>>> }
>>> catch ( InterruptedException e ) {
>>> log.info( "Interrupted adding watch -- shutting down?"
>>>
>> );
>>
>>> return;
>>> }
>>> catch ( KeeperException e ) {
>>> // oh crap, now what?
>>> }
>>> }
>>> }
>>>
>>> (In this cases, watchChildren is just calling getChildren and passing
>>> the watcher in.)
>>>
>>> It occurs to me I could get more and more complicated here: I could
>>> wrap watchChildren in a while loop until it succeeds, but that seems
>>> kind of rude to the caller. Plus what if I get a
>>> KeeperException.SessionExpiredException or a
>>> KeeperException.ConnectionLossException? How to handle that in this
>>> loop? Or I could send some other thread a message that it needs to
>>>
>> keep
>>
>>> trying until the watch has been re-added ... but ... yuck.
>>>
>>> I would very much like to just setup this watch once, and have
>>>
>> ZooKeeper
>>
>>> make sure it keeps firing until I tear down ZooKeeper -- this logic
>>> seems tricky for clients, and quite error prone and full of race
>>>
>> conditions.
>>
>>> Any thoughts?
>>>
>>> Thanks,
>>> Eric
>>>
>>>
>>
>>
>>
>
--
Eric Bowman
Boboco Ltd
ebowman@boboco.ie
http://www.boboco.ie/ebowman/pubkey.pgp
+35318394189/+353872801532
|