zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cameron McKenzie <mckenzie....@gmail.com>
Subject Re: Ephemeral node bound to a session that times out while ZK has no quorum
Date Thu, 08 May 2014 10:31:04 GMT
Sorry, bashed send prematurely!

Guys,
I've noticed a weird problem with ephemeral nodes not being cleaned up if
the session they are tied to times out while ZooKeeper does not have a
quorum. The situation is basically as follows:

3 node cluster
-Client connects to cluster and creates an ephemeral node
-Two nodes die, so quorum is lost
-Some time passes (longer than the session timeout negotiated for the
client that created the ephemeral node)
-One (or both) of the dead nodes come back and a quorum is reformed.
-The ephemeral node tied to the session which should have timed out still
exists and never seems to get cleaned up.
-If I telnet in on port 2181 and 'dump', then I can see that ZK seems to
think that the session is still active and associated with the ephemeral
node in question.
-It seems to stay in this state for some extended period of time (20+
minutes). Interestingly, when I happened to fire up zkCli.sh I could see
that the node was still there, but after I exited, the node seemed to
disappear shortly afterwards. So, I wonder if the session established by
zkCli.sh ending somehow triggered the cleanup of this rogue ephemeral node?

Has anyone experience this issue before? I understand that it's a bit of an
edge case, but I'm running across it quite frequently when testing changing
the size of ZK cluster.

I've thought of a few work arounds for the issue, but I'd like to know if
it's a known issue.

Any help appreciated!
cheers



On Thu, May 8, 2014 at 8:15 PM, Cameron McKenzie <mckenzie.cam@gmail.com>wrote:

> Guys,
> I've noticed a weird problem with ephemeral nodes not being cleaned up if
> the session they are tied to times out while ZooKeeper does not have a
> quorum. The situation is basically as follows:
>
> 3 node cluster
> -Client connects to cluster and creates an ephemeral node
> -Two nodes die, so quorum is lost
> -Some time passes (longer than the session timeout negotiated for the
> client that created the ephemeral node)
> -One (or both) of the dead nodes come back and a quorum is reformed.
> -The ephemeral node tied to the session which should have timed out still
> exists
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message