zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Richardson <m...@motum.be>
Subject Re: Recovering from zxid rollover
Date Mon, 29 May 2017 06:25:21 GMT
Unsubscribe


Mike Richardson

Senior Software Engineer



*MoTuM N.V. | Dellingstraat 34 | B-2800 MECHELEN | Belgium*


T +32(0)15 28 16 63
M +41 7943 69538


www.motum.be

On 26 May 2017 at 20:45, Patrick Hunt <phunt@apache.org> wrote:

> On Wed, May 24, 2017 at 8:08 AM, Mike Heffner <mike@librato.com> wrote:
>
> > On Tue, May 23, 2017 at 10:21 PM, Patrick Hunt <phunt@apache.org> wrote:
> >
> > > On Tue, May 23, 2017 at 3:47 PM, Mike Heffner <mike@librato.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm curious what the best practices are for handling zxid rollover
> in a
> > > ZK
> > > > ensemble. We have a few five-node ZK ensembles (some 3.4.8 and some
> > > 3.3.6)
> > > > and they periodically rollover their zxid. We see the following in
> the
> > > > system logs on the leader node:
> > > >
> > > > 2017-05-22 12:54:14,117 [myid:15] - ERROR [ProcessThread(sid:15
> > > > cport:-1)::ZooKeeperCriticalThread@49] - Severe unrecoverable error,
> > > from
> > > > thread : ProcessThread(sid:15 cport:-1):
> > > > org.apache.zookeeper.server.RequestProcessor$RequestProcesso
> > rException:
> > > > zxid lower 32 bits have rolled over, forcing re-election, and
> therefore
> > > new
> > > > epoch start
> > > >
> > > > From my best understanding of the code, this exception will end up
> > > causing
> > > > the leader to enter shutdown():
> > > >
> > > > https://github.com/apache/zookeeper/blob/09cd5db55446a4b390f
> > > > 82e3548b929f19e33430d/src/java/main/org/apache/zookeeper/
> > > > server/ZooKeeperServer.java#L464-L464
> > > >
> > > > This shuts down the zookeeper instance from servicing requests, but
> the
> > > JVM
> > > > is still actually running. What we experience is that while this ZK
> > > > instance is still running, the remaining follower nodes can't
> re-elect
> > a
> > > > leader (at least within 15 mins) and quorum is offline. Our
> remediation
> > > so
> > > > far has been to restart the original leader node, at which point the
> > > > cluster recovers.
> > > >
> > > > The two questions I have are:
> > > >
> > > > 1. Should the remaining 4 nodes be able to re-elect a leader after
> zxid
> > > > rollover without intervention (restarting)?
> > > >
> > > >
> > > Hi Mike.
> > >
> > > That is the intent. Originally the epoch would rollover and cause the
> > > cluster to hang (similar to what you are reporting), the JIRA is here
> > > https://issues.apache.org/jira/browse/ZOOKEEPER-1277
> > > However the patch, calling shutdown of the leader, was intended to
> force
> > a
> > > re-election before the epoch could rollover.
> > >
> >
> > Should the leader JVM actually exit during this shutdown, thereby
> allowing
> > the init system to restart it?
> >
> >
> iirc it should not be necessary but it's been some time since I looked at
> it.
>
>
> >
> > >
> > >
> > > > 2. If the leader enters shutdown() state after a zxid rollover, is
> > there
> > > > any scenario where it will return to started? If not, how are others
> > > > handling this scenario -- maybe a healthcheck that kills/restarts an
> > > > instance that is in shutdown state?
> > > >
> > > >
> > > I have run into very few people who have seen the zxid rollover and
> > testing
> > > under real conditions is not easily done. We have unit tests but that
> > code
> > > is just not exercised sufficiently in everyday use. You're not seeing
> > > what's intended, please create a JIRA and include any additional
> details
> > > you can (e.g. config, logs)
> > >
> >
> > Sure, I've opened one here:
> > https://issues.apache.org/jira/browse/ZOOKEEPER-2791
> >
> >
> > >
> > > What I heard people (well really one user, I have personally only seen
> > this
> > > at one site) were doing prior to 1277 was monitoring the epoch number,
> > and
> > > when it got close to rolling over (within 10% say) they would force the
> > > current leader to restart by restarting the process. The intent of 1277
> > was
> > > to effectively do this automatically.
> > >
> >
> > We are looking at doing something similar, maybe once a week finding the
> > current leader and restarting it. From testing this quickly re-elects a
> new
> > leader and resets the zxid to zero so it should avoid the rollover that
> > occurs after a few weeks of uptime.
> >
> >
> Exactly. This is pretty much the same scenario that I've seen in the past,
> along with a similar workaround.
>
> You might want to take a look at the work Benedict Jin has done here:
> https://issues.apache.org/jira/browse/ZOOKEEPER-2789
> Given you are seeing this so frequently it might be something you could
> collaborate on with the author of the patch? I have not looked at it in
> great detail but it may allow you to run longer w/o seeing the issue. I
> have not thought through all the implications though... (including b/w
> compat).
>
> Patrick
>
>
> >
> > >
> > > Patrick
> > >
> > >
> > > >
> > > > Cheers,
> > > >
> > > > Mike
> > > >
> > > >
> > >
> >
> > Mike
> >
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message