Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 92FBC200C81 for ; Fri, 26 May 2017 20:45:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9177A160BC7; Fri, 26 May 2017 18:45:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D7D44160B9C for ; Fri, 26 May 2017 20:45:45 +0200 (CEST) Received: (qmail 91538 invoked by uid 500); 26 May 2017 18:45:44 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 91527 invoked by uid 99); 26 May 2017 18:45:44 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 May 2017 18:45:44 +0000 Received: from mail-lf0-f53.google.com (mail-lf0-f53.google.com [209.85.215.53]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id C53421A031B for ; Fri, 26 May 2017 18:45:43 +0000 (UTC) Received: by mail-lf0-f53.google.com with SMTP id m18so10397818lfj.0 for ; Fri, 26 May 2017 11:45:43 -0700 (PDT) X-Gm-Message-State: AODbwcA4ZB0k2Nd89arOojGiS3x7cVznQHbhXE0n7YghaZRk0mAA6x3M JGX1+PtzPgYYfjWUUJFCAu+ZM3hcJQ== X-Received: by 10.46.92.131 with SMTP id q125mr1181598ljb.57.1495824341839; Fri, 26 May 2017 11:45:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.99.146 with HTTP; Fri, 26 May 2017 11:45:01 -0700 (PDT) In-Reply-To: References: From: Patrick Hunt Date: Fri, 26 May 2017 11:45:01 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Recovering from zxid rollover To: UserZooKeeper Content-Type: multipart/alternative; boundary="94eb2c1bb2308b7a43055071be9f" archived-at: Fri, 26 May 2017 18:45:46 -0000 --94eb2c1bb2308b7a43055071be9f Content-Type: text/plain; charset="UTF-8" On Wed, May 24, 2017 at 8:08 AM, Mike Heffner wrote: > On Tue, May 23, 2017 at 10:21 PM, Patrick Hunt wrote: > > > On Tue, May 23, 2017 at 3:47 PM, Mike Heffner wrote: > > > > > Hi, > > > > > > I'm curious what the best practices are for handling zxid rollover in a > > ZK > > > ensemble. We have a few five-node ZK ensembles (some 3.4.8 and some > > 3.3.6) > > > and they periodically rollover their zxid. We see the following in the > > > system logs on the leader node: > > > > > > 2017-05-22 12:54:14,117 [myid:15] - ERROR [ProcessThread(sid:15 > > > cport:-1)::ZooKeeperCriticalThread@49] - Severe unrecoverable error, > > from > > > thread : ProcessThread(sid:15 cport:-1): > > > org.apache.zookeeper.server.RequestProcessor$RequestProcesso > rException: > > > zxid lower 32 bits have rolled over, forcing re-election, and therefore > > new > > > epoch start > > > > > > From my best understanding of the code, this exception will end up > > causing > > > the leader to enter shutdown(): > > > > > > https://github.com/apache/zookeeper/blob/09cd5db55446a4b390f > > > 82e3548b929f19e33430d/src/java/main/org/apache/zookeeper/ > > > server/ZooKeeperServer.java#L464-L464 > > > > > > This shuts down the zookeeper instance from servicing requests, but the > > JVM > > > is still actually running. What we experience is that while this ZK > > > instance is still running, the remaining follower nodes can't re-elect > a > > > leader (at least within 15 mins) and quorum is offline. Our remediation > > so > > > far has been to restart the original leader node, at which point the > > > cluster recovers. > > > > > > The two questions I have are: > > > > > > 1. Should the remaining 4 nodes be able to re-elect a leader after zxid > > > rollover without intervention (restarting)? > > > > > > > > Hi Mike. > > > > That is the intent. Originally the epoch would rollover and cause the > > cluster to hang (similar to what you are reporting), the JIRA is here > > https://issues.apache.org/jira/browse/ZOOKEEPER-1277 > > However the patch, calling shutdown of the leader, was intended to force > a > > re-election before the epoch could rollover. > > > > Should the leader JVM actually exit during this shutdown, thereby allowing > the init system to restart it? > > iirc it should not be necessary but it's been some time since I looked at it. > > > > > > > > 2. If the leader enters shutdown() state after a zxid rollover, is > there > > > any scenario where it will return to started? If not, how are others > > > handling this scenario -- maybe a healthcheck that kills/restarts an > > > instance that is in shutdown state? > > > > > > > > I have run into very few people who have seen the zxid rollover and > testing > > under real conditions is not easily done. We have unit tests but that > code > > is just not exercised sufficiently in everyday use. You're not seeing > > what's intended, please create a JIRA and include any additional details > > you can (e.g. config, logs) > > > > Sure, I've opened one here: > https://issues.apache.org/jira/browse/ZOOKEEPER-2791 > > > > > > What I heard people (well really one user, I have personally only seen > this > > at one site) were doing prior to 1277 was monitoring the epoch number, > and > > when it got close to rolling over (within 10% say) they would force the > > current leader to restart by restarting the process. The intent of 1277 > was > > to effectively do this automatically. > > > > We are looking at doing something similar, maybe once a week finding the > current leader and restarting it. From testing this quickly re-elects a new > leader and resets the zxid to zero so it should avoid the rollover that > occurs after a few weeks of uptime. > > Exactly. This is pretty much the same scenario that I've seen in the past, along with a similar workaround. You might want to take a look at the work Benedict Jin has done here: https://issues.apache.org/jira/browse/ZOOKEEPER-2789 Given you are seeing this so frequently it might be something you could collaborate on with the author of the patch? I have not looked at it in great detail but it may allow you to run longer w/o seeing the issue. I have not thought through all the implications though... (including b/w compat). Patrick > > > > > Patrick > > > > > > > > > > Cheers, > > > > > > Mike > > > > > > > > > > Mike > --94eb2c1bb2308b7a43055071be9f--