zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: ephemeral node after server bounce
Date Thu, 04 Feb 2010 22:56:12 GMT
hah, you guys beat me to the punch. I think having some unique per 
client token might also work (see my resp). Perhaps this is the ip of 
the host or better (esp if multiple clients on a single host) would be 
some solr specific id that uniquely identifies each node.

Patrick

Benjamin Reed wrote:
> i second ted's proposals! thanx ted.
> 
> there is one other option. when you create the ZooKeeper object you can 
> pass a session id and password. your bounced server can actually 
> reattach to the session. (that is why we put that constructor in.) to 
> use it you need to save the session id and password to a persistent 
> store (a file) when you first attach, and then when you restart read the 
> id and password from the file.
> 
> ben
> 
> Ted Dunning wrote:
>> On Thu, Feb 4, 2010 at 2:20 PM, Yonik Seeley 
>> <yonik@lucidimagination.com>wrote:
>>
>>  
>>> There's no way to "hand over" responsibility for an ephemeral znode, 
>>> right?
>>>
>>>     
>>
>> Right.
>>
>>
>>  
>>> We have solr nodes create ephemeral znodes (name based on host and 
>>> port).
>>> The ephemeral znode takes some time to remove of course, so what
>>> happens is that if I bounce a solr server (containing a zk client) the
>>> ephemeral node will still exist when the server comes back up.
>>>
>>>     
>>
>> This problem comes up with any system that has hysteresis and needs a 
>> single
>> point of control.
>>
>>
>>  
>>> What's the best way to handle this situation?  Delete and re-create?
>>>
>>>     
>> Watch it and re-create when it does disappear?
>>  
>> I think you need to handle the problem of multiple search nodes coming 
>> up on
>> the same machine, possibly because the old one may have hung up.
>>
>> So... I would recommend
>>
>> a) if the ephemeral still exists, wait for a few more seconds to see 
>> if it
>> disappears (20?)
>>
>> b) if it goes away, create a new one and continue as normal
>>
>> c) if it doesn't go away take additional action to determine if 
>> service is
>> still running (i.e. panic and run in circles).
>>   
> 

Mime
View raw message