zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Benediktson <dbenedikt...@twitter.com.INVALID>
Subject Re: Handling of xid rollover
Date Wed, 22 Jun 2016 18:08:19 GMT
I believe if local sessions are in use and the session in question hasn't
been upgraded to global by creating an ephemeral node, it would see session
expiration after a leader election (unless maybe if it lands on the same
peer - I do not remember if the session table gets recycled completely in
that case).

On Wed, Jun 22, 2016 at 10:58 AM, Patrick Hunt <phunt@apache.org> wrote:

> Hi Mark. See this jira for background:
> https://issues.apache.org/jira/browse/ZOOKEEPER-1277
>
> However what you describe is correct behavior from our perspective. When
> the lower 32 roll over we now (that was the fix) force a re-election of the
> leader. Leader re-election causes the quorum to stop serving clients until
> a new quorum forms.
>
> Leader re-election is a normal behavior for the ZK service, it happens
> whenever the current leader is lost and a new quorum, with a (possibly new)
> leader needs to reform. Say if the current leader process is restarted.
> Your clients need to be able to handle this situation (typically the client
> library does this for you).
>
> That said, you should not be seeing session expiration as a result of this.
> Client timeouts certainly, but not session expiration. It might happen for
> other reasons, but the leader is the one responsible for expiring sessions.
> If there is no leader (e.g. being re-elected) there is no session
> expiration. When the new leader is elected it will reset the clock on
> session expiration, for all sessions, from the time it's reelected. For
> example you can shutdown the entire ZK server ensemble, start it back up an
> hour later and the clients should all be able to rejoin. Hm, that said I'm
> not sure if Curator is doing some special magic, that's the behavior of the
> stock client that we ship.
>
> Patrick
>
>
> On Wed, Jun 22, 2016 at 6:18 AM, Figura, Mark <mfigura@empirix.com> wrote:
>
> > Hi,
> >
> > We are using ZooKeeper 3.4.5 along with Curator to perform leader
> > elections and also store some application data on a 3-node ensemble. Our
> > application is not hard-realtime, but glitches in stream processing do
> get
> > noticed and may raise support tickets.
> >
> > Yesterday, we had such a glitch and by looking through the logs, I found
> > there was an XID rollover. When this happened, a new election within the
> > ensemble was triggered and all client connections were closed. From our
> > application's point of view (possibly filtered through Curator), we saw
> the
> > session expire and then the connection was lost. This caused our
> > application to shutdown each component, re-perform leader elections, and
> > eventually start back up.
> >
> > We do have an issue where our application is making many more writes than
> > it should, but once this is fixed, we'll still run into an XID rollover
> > sooner or later.
> >
> > Is there something our application can do to handle this situation
> better?
> > Are there any plans for Zookeeper to handle this situation without
> closing
> > client connections?
> >
> > Thanks!
> > Mark
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message