zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frédéric Jolliton <frede...@jolliton.com>
Subject Locks based on ephemeral nodes - Handling network outage correctly
Date Wed, 12 Oct 2011 09:32:34 GMT
Hello all,

There is something that bother me about ephemeral nodes.

I need to create some locks using Zookeeper. I followed the "official"
recipe, except that I don't use the EPHEMERAL flag. The reason for that
is that I don't know how I should proceed if the connection to Zookeeper
ensemble is ever lost. But otherwise, everything works nicely.

The EPHEMERAL flag is useful if the owner of the lock disappear (exiting
abnormally). From the point of view of the Zookeeper ensemble, the
connection time out (or is closed explicitly), the lock is released.
That's great.

However, if I lose the connection temporarily (network outage), the
Zookeeper ensemble again see the connection timing out.. but actually
the owner of the lock is still there doing some work on the locked
resource. But the lock is released by Zookeeper anyway.

How should this case be handled?

All I can see is that the owner can only verify that the lock was no
longer owned because releasing the lock will give a Session Expired
error (assuming we retry reconnecting while we get a Connection Loss
error) or because of an event sent at some point because the connection
was also closed automatically on the client side by libkeeper (not sure
about this last point). Knowing that the connection expired necessary
mean that the lock was lost but it may be too late.

I mean that there is a short time lapse where the process that own the
lock have not tried to release it yet and thus don't know it lost it,
and another process was able to acquire it too in the meantime. This is
a big problem.

That's why I avoid the EPHEMERAL flag for now, and plan to rely on
periodic cleaning task to drop locks no longer owned by some process (a
task which is not trivial either.)

I would appreciate any tips to handle such situation in a better way.
What is your experience in such cases?


Frédéric Jolliton
Outscale SAS

View raw message