zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cameron McKenzie <mckenzie....@gmail.com>
Subject Re: Ephemeral node bound to a session that times out while ZK has no quorum
Date Thu, 15 May 2014 23:20:28 GMT
hey Michi,
I'll have to double check the logs to see if the client got a session
expired event, but I would presume so because the ephemeral nodes lying
around had a different session ID. I guess it's a possibility that the old
connection stayed open, and a new one was also created, but I don't believe
this to be the case.
cheers


On Thu, May 15, 2014 at 12:41 PM, Michi Mutsuzaki <michi@cs.stanford.edu>wrote:

> Hi Cameron,
>
> Did the client get the session expired event? Sessions don't expire
> during quorum loss, and I'm guessing the session got revalidated when
> the cluster reformed a quorum.
>
>
> On Thu, May 8, 2014 at 3:31 AM, Cameron McKenzie <mckenzie.cam@gmail.com>
> wrote:
> > Sorry, bashed send prematurely!
> >
> > Guys,
> > I've noticed a weird problem with ephemeral nodes not being cleaned up if
> > the session they are tied to times out while ZooKeeper does not have a
> > quorum. The situation is basically as follows:
> >
> > 3 node cluster
> > -Client connects to cluster and creates an ephemeral node
> > -Two nodes die, so quorum is lost
> > -Some time passes (longer than the session timeout negotiated for the
> > client that created the ephemeral node)
> > -One (or both) of the dead nodes come back and a quorum is reformed.
> > -The ephemeral node tied to the session which should have timed out still
> > exists and never seems to get cleaned up.
> > -If I telnet in on port 2181 and 'dump', then I can see that ZK seems to
> > think that the session is still active and associated with the ephemeral
> > node in question.
> > -It seems to stay in this state for some extended period of time (20+
> > minutes). Interestingly, when I happened to fire up zkCli.sh I could see
> > that the node was still there, but after I exited, the node seemed to
> > disappear shortly afterwards. So, I wonder if the session established by
> > zkCli.sh ending somehow triggered the cleanup of this rogue ephemeral
> node?
> >
> > Has anyone experience this issue before? I understand that it's a bit of
> an
> > edge case, but I'm running across it quite frequently when testing
> changing
> > the size of ZK cluster.
> >
> > I've thought of a few work arounds for the issue, but I'd like to know if
> > it's a known issue.
> >
> > Any help appreciated!
> > cheers
> >
> >
> >
> > On Thu, May 8, 2014 at 8:15 PM, Cameron McKenzie <mckenzie.cam@gmail.com
> >wrote:
> >
> >> Guys,
> >> I've noticed a weird problem with ephemeral nodes not being cleaned up
> if
> >> the session they are tied to times out while ZooKeeper does not have a
> >> quorum. The situation is basically as follows:
> >>
> >> 3 node cluster
> >> -Client connects to cluster and creates an ephemeral node
> >> -Two nodes die, so quorum is lost
> >> -Some time passes (longer than the session timeout negotiated for the
> >> client that created the ephemeral node)
> >> -One (or both) of the dead nodes come back and a quorum is reformed.
> >> -The ephemeral node tied to the session which should have timed out
> still
> >> exists
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message