zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Locks based on ephemeral nodes - Handling network outage correctly
Date Wed, 12 Oct 2011 18:08:33 GMT
ZK and the client will realize that the connection is interrupted very
quickly.  You will get a disconnection event at that time.  The ZK client
software will automatically try to reconnect.  When it succeeds, you will be
notified of the reconnection or of a session expiration.

Note that you will be notified of the connection loss *before* ZK deletes
your ephemeral file (if the clock on the ZK server is stable).

Any method you use will have the problem that the connection loss is not
detected immediately.

You might get some mileage by considering your problem as a leader election
rather than a lock.

2011/10/12 Frédéric Jolliton <frederic@jolliton.com>

>  4. An network outage occurs. In parallel:
>   a. ZK will need some time before realizing it.
>   b. CLIENT will need some time before realizing it.
> At step 4, it is possible that ZK has dropped the lock after the time
> out, while the CLIENT has not been notified (by libkeeper itself),
> because, for example, of high load on the machine (thus delaying the ZK
> thread to notify the main thread about it.) I don't see determinist
> behavior here. CLIENT will continue to assume that it own the lock. Even
> if I could lower some timeout threshold, there are still the problems of
> very high load perturbing timing assumption.
> I understand that reliable distributed synchronisation is a hard task
> (if even possible.) But I would like to figure the strongest way to
> handle such error, and it looks like ephemeral nodes are not appropriate
> for that.
> I will be pleased to be proved wrong.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message