zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Locks based on ephemeral nodes - Handling network outage correctly
Date Wed, 12 Oct 2011 11:32:58 GMT
ZK will tell you when the connection is lost (but not yet expired).  When
this happens, the application needs to pay attention and pause before
continuing to assume it still has the lock.

2011/10/12 Frédéric Jolliton <frederic@jolliton.com>

> Hello all,
>
> There is something that bother me about ephemeral nodes.
>
> I need to create some locks using Zookeeper. I followed the "official"
> recipe, except that I don't use the EPHEMERAL flag. The reason for that
> is that I don't know how I should proceed if the connection to Zookeeper
> ensemble is ever lost. But otherwise, everything works nicely.
>
> The EPHEMERAL flag is useful if the owner of the lock disappear (exiting
> abnormally). From the point of view of the Zookeeper ensemble, the
> connection time out (or is closed explicitly), the lock is released.
> That's great.
>
> However, if I lose the connection temporarily (network outage), the
> Zookeeper ensemble again see the connection timing out.. but actually
> the owner of the lock is still there doing some work on the locked
> resource. But the lock is released by Zookeeper anyway.
>
> How should this case be handled?
>
> All I can see is that the owner can only verify that the lock was no
> longer owned because releasing the lock will give a Session Expired
> error (assuming we retry reconnecting while we get a Connection Loss
> error) or because of an event sent at some point because the connection
> was also closed automatically on the client side by libkeeper (not sure
> about this last point). Knowing that the connection expired necessary
> mean that the lock was lost but it may be too late.
>
> I mean that there is a short time lapse where the process that own the
> lock have not tried to release it yet and thus don't know it lost it,
> and another process was able to acquire it too in the meantime. This is
> a big problem.
>
> That's why I avoid the EPHEMERAL flag for now, and plan to rely on
> periodic cleaning task to drop locks no longer owned by some process (a
> task which is not trivial either.)
>
> I would appreciate any tips to handle such situation in a better way.
> What is your experience in such cases?
>
> Regards,
>
> --
> Frédéric Jolliton
> Outscale SAS
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message