zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frédéric Jolliton <frede...@jolliton.com>
Subject Re: Locks based on ephemeral nodes - Handling network outage correctly
Date Wed, 12 Oct 2011 12:06:16 GMT
Ted Dunning writes:
> ZK will tell you when the connection is lost (but not yet expired).  When
> this happens, the application needs to pay attention and pause before
> continuing to assume it still has the lock.

Say I've client CLIENT establishing a connection to ZK.

 1. CLIENT take care of events (connection loss, ..) using the callback
    defined in the handler when establishing the connection.

 2. CLIENT acquire the lock on ZK for some resource, using ephemeral

 3. CLIENT begin performing action on the resource.

 4. An network outage occurs. In parallel:

   a. ZK will need some time before realizing it.

   b. CLIENT will need some time before realizing it.

At step 4, it is possible that ZK has dropped the lock after the time
out, while the CLIENT has not been notified (by libkeeper itself),
because, for example, of high load on the machine (thus delaying the ZK
thread to notify the main thread about it.) I don't see determinist
behavior here. CLIENT will continue to assume that it own the lock. Even
if I could lower some timeout threshold, there are still the problems of
very high load perturbing timing assumption.

I understand that reliable distributed synchronisation is a hard task
(if even possible.) But I would like to figure the strongest way to
handle such error, and it looks like ephemeral nodes are not appropriate
for that.

I will be pleased to be proved wrong.


Frédéric Jolliton
Outscale SAS

View raw message