hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@yahoo-inc.com>
Subject Re: Membership using ZK
Date Tue, 12 Oct 2010 21:23:33 GMT
  yes, your watcher objects will get the connectionloss event and 
eventually the session expired event.

ben

On 10/12/2010 10:57 AM, Avinash Lakshman wrote:
> Would my watcher get invoked on this ConnectionLoss event? If so I am
> thinking I will check for KeeperState.Disconnected and reset my state. Is my
> understanding correct? Please advice.
>
> Thanks
> Avinash
>
> On Tue, Oct 12, 2010 at 10:45 AM, Benjamin Reed<breed@yahoo-inc.com>  wrote:
>
>>   ZooKeeper considers a client dead when it hasn't heard from that client
>> during the timeout period. clients make sure to communicate with ZooKeeper
>> at least once in 1/3 the timeout period. if the client doesn't hear from
>> ZooKeeper in 2/3 the timeout period, the client will issue a ConnectionLoss
>> event and cause outstanding requests to fail with a ConnectionLoss.
>>
>> So, if ZooKeeper decides a process is dead, the process will get a
>> ConnectionLoss event. Once ZooKeeper decides that a client is dead, if the
>> client reconnects, the client will get a SessionExpired. Once a session is
>> expired, the expired handle will become useless, so no new requests, no
>> watches, etc.
>>
>> The bottom line is if your process gets a process expired, you need to
>> treat that process as expired and recover by creating a new zookeeper handle
>> (possibly by restarting the process) and resetup your state.
>>
>> ben
>>
>>
>> On 10/12/2010 09:54 AM, Avinash Lakshman wrote:
>>
>>> This is what I have going:
>>>
>>> I have a bunch of 200 nodes come up and create an ephemeral entry under a
>>> znode names /Membership. When nodes are detected dead the node associated
>>> with the dead node under /Membership is deleted and watch delivered to the
>>> rest of the members. Now there are circumstances a node A is deemed dead
>>> while the process is still up and running on A. It is a false detection
>>> which I need to probably deal with. How do I deal with this situation?
>>>   Over
>>> time false detections delete all the entries underneath the /Membership
>>> znode even though all processes are up and running.
>>>
>>> So my questions are:
>>> Would the watches be pushed out to the node that is falsely deemed dead?
>>> If
>>> so I can have that process recreate the ephemeral znode underneath
>>> /Membership.
>>> If a node leaves a watch and then truly crashes. When it comes back up
>>> would
>>> it get watches it missed during the interim period? In any case how do
>>> watches behave in the event of false/true failure detection?
>>>
>>> Thanks
>>> A
>>>
>>


Mime
View raw message