zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Handling of xid rollover
Date Wed, 22 Jun 2016 17:58:42 GMT
Hi Mark. See this jira for background:
https://issues.apache.org/jira/browse/ZOOKEEPER-1277

However what you describe is correct behavior from our perspective. When
the lower 32 roll over we now (that was the fix) force a re-election of the
leader. Leader re-election causes the quorum to stop serving clients until
a new quorum forms.

Leader re-election is a normal behavior for the ZK service, it happens
whenever the current leader is lost and a new quorum, with a (possibly new)
leader needs to reform. Say if the current leader process is restarted.
Your clients need to be able to handle this situation (typically the client
library does this for you).

That said, you should not be seeing session expiration as a result of this.
Client timeouts certainly, but not session expiration. It might happen for
other reasons, but the leader is the one responsible for expiring sessions.
If there is no leader (e.g. being re-elected) there is no session
expiration. When the new leader is elected it will reset the clock on
session expiration, for all sessions, from the time it's reelected. For
example you can shutdown the entire ZK server ensemble, start it back up an
hour later and the clients should all be able to rejoin. Hm, that said I'm
not sure if Curator is doing some special magic, that's the behavior of the
stock client that we ship.

Patrick


On Wed, Jun 22, 2016 at 6:18 AM, Figura, Mark <mfigura@empirix.com> wrote:

> Hi,
>
> We are using ZooKeeper 3.4.5 along with Curator to perform leader
> elections and also store some application data on a 3-node ensemble. Our
> application is not hard-realtime, but glitches in stream processing do get
> noticed and may raise support tickets.
>
> Yesterday, we had such a glitch and by looking through the logs, I found
> there was an XID rollover. When this happened, a new election within the
> ensemble was triggered and all client connections were closed. From our
> application's point of view (possibly filtered through Curator), we saw the
> session expire and then the connection was lost. This caused our
> application to shutdown each component, re-perform leader elections, and
> eventually start back up.
>
> We do have an issue where our application is making many more writes than
> it should, but once this is fixed, we'll still run into an XID rollover
> sooner or later.
>
> Is there something our application can do to handle this situation better?
> Are there any plans for Zookeeper to handle this situation without closing
> client connections?
>
> Thanks!
> Mark
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message