From user-return-12102-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Tue Aug 27 12:14:29 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 92F84180654 for ; Tue, 27 Aug 2019 14:14:29 +0200 (CEST) Received: (qmail 56263 invoked by uid 500); 27 Aug 2019 12:14:28 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 56250 invoked by uid 99); 27 Aug 2019 12:14:28 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Aug 2019 12:14:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 864AC180A4B for ; Tue, 27 Aug 2019 12:14:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.301 X-Spam-Level: ** X-Spam-Status: No, score=2.301 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Vaxc8tlaHuxU for ; Tue, 27 Aug 2019 12:14:25 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::d43; helo=mail-io1-xd43.google.com; envelope-from=subharaj.manna@gmail.com; receiver= Received: from mail-io1-xd43.google.com (mail-io1-xd43.google.com [IPv6:2607:f8b0:4864:20::d43]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 85CB97DD30 for ; Tue, 27 Aug 2019 12:14:24 +0000 (UTC) Received: by mail-io1-xd43.google.com with SMTP id j5so45692285ioj.8 for ; Tue, 27 Aug 2019 05:14:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=dxdIrFDzNwTYhEKZUPF8LUd+j7VoHi2J1GjFVuNvdEQ=; b=iTiVkjH5pJLhqvLJ1QnwDxyttZh5SY6TDUTZYqC/Z/e8FNknlVrTatlpVqeEQvANZe pft3W/YK8j1mknmExu6Vm9EowYG6H2EGhFKDmDRQZQKDMq6BURifpVKAfXiwh8q8Dv0B kdKlbIR/DKVp+yWxZjyDjZniRUlG/VQIvwpZSLio1iS1A4VHr9yQsc7zj45RwqxBg1wb 6GXYJjG3icHhagNjFQer7bSfWVAcfZx8QMA7dSaC+xwecwl6uw/X5Xu03o2F9nqsIve6 YjyajsNlCgMHmNPEsKFAn2WK0rFvTtLDrN77QzACzsbza9KWSl6ybtJO83hw01w/nKK7 /1Rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=dxdIrFDzNwTYhEKZUPF8LUd+j7VoHi2J1GjFVuNvdEQ=; b=BTL9JKik280oXqq6rApmfRskKRErj6upDYQJaxtJEeU93V+k0a1wcmSsk/pNGEhTRq BOweB/xWbRAhTTwI7duEJluvxKM9oU5lbdlWqvSl20JRqHiXVPieISRIHd3JpZFTP/X+ Wz6hcNaLu9okVic37XiuMSpj5PEDyAq8WrySS5sjkcdUx6+zo+fZmUxz+oFxp+r4M765 p+0j8JUxuUGwVCS+QARtfQl+jlYx/jsjtPyDNHzSRi54PBaxjDDRd+7k/OdWeOXlE2AZ wtIxobKRTdr1zpKMJE+sNtvPQpCD/fLGXBHU4v6a5qoQERYe1jTfu4Peld4/70ZrmfDQ 61Bg== X-Gm-Message-State: APjAAAXPBVjI+9rIy4x1tqf5/1JWyFuSYQR8gLIPSMBKxk0r/c9MFAtK Lr6qefMpsDVv0Dwk6oEuVW6aq1Fa8xnDKtE8JUn3ouhI X-Google-Smtp-Source: APXvYqzDnSCeci0SQTM9mFYQyBBgAw8zfhIt+j9c9BziJJ359rvJIE0q5uVhRl0YE+ygbN3vaKwitrb/i3oD1l4VxSI= X-Received: by 2002:a02:ac84:: with SMTP id x4mr21828254jan.2.1566908062862; Tue, 27 Aug 2019 05:14:22 -0700 (PDT) MIME-Version: 1.0 References: <9EDA0807-9D4B-4501-A322-B7748A7D8F33@apache.org> <01901C5F-AAA0-4864-B530-49D3BAC013B8@apache.org> In-Reply-To: <01901C5F-AAA0-4864-B530-49D3BAC013B8@apache.org> From: Debraj Manna Date: Tue, 27 Aug 2019 17:43:56 +0530 Message-ID: Subject: Re: The current epoch, 7, is older than the last zxid, 8589935882 To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary="0000000000007b9327059118364a" --0000000000007b9327059118364a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable No I don't see the updatingEpoch file in /var/lib/zookeeper/version-2 I started zookeeper by adding set -x in /usr/bin/zookeeper-server I can see zookeeper is getting started with 3.4.13 as shown below . The complete logs are placed in the below gist https://gist.github.com/debraj-manna/509ec3d497016c4a249ee2b8dace05d9 nohup java -Dzookeeper.datadir.autocreate=3Dfalse -Dzookeeper.log.dir=3D/var/log/zookeeper -Dzookeeper.root.logger=3DINFO,ROLLINGFILE -cp '/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/li= b/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12.jar:/usr/lib/zookeeper/= bin/../lib/slf4j-log4j12-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-= 1.7.5.jar:/usr/lib/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/lib/zoo= keeper/bin/../lib/log4j-1.2.16.jar:/usr/lib/zookeeper/bin/../lib/jline-2.11= .jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.13.jar:/usr/lib/zookeeper/bin/= ../src/java/lib/*.jar:/etc/zookeeper/conf::/etc/zookeeper/conf:/usr/lib/zoo= keeper/*:/usr/lib/zookeeper/lib/*' -Dzookeeper.log.threshold=3DINFO -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=3Dfalse org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg + sleep 1 + echo STARTED STARTED The content of zookeeper.log is placed in the below gist after the start https://gist.github.com/debraj-manna/9800c5bef32837c62bdfb324c0589ad6 Let me know if you need any more logs. On Mon, Aug 26, 2019 at 9:21 PM Andor Molnar wrote: > I confirmed that the fix is included in 3.4.13. That=E2=80=99s why I aske= d if you > can see =E2=80=98updatingEpoch=E2=80=99 file in the data folder. > > I don=E2=80=99t think the issue is not related, but I want to make sure t= hat > you=E2=80=99re running the right version by verifying the beginning of ZK= logs. > > Andor > > > > > On 2019. Aug 26., at 13:43, Debraj Manna > wrote: > > > > Below is the content of currentEpoch.tmp > > > > support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch > > 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch > > 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat > currentEpoch.tmp > > 8support@platform2 > > > > Starting zookeeper logs are rolled over as the issue was there for some > > time. Will the current log with the node in this state help? Btw why do > you > > think this issue may not be related to zookeeper? > > > > > > > > On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar wrote: > > > >> Hi Debraj, > >> > >> The fix should be in all 3.4 versions from 3.4.6 onward, including > 3.4.13. > >> Can you see =E2=80=98updatingEpoch=E2=80=99 file in /var/lib/zookeeper= /version-2 ? > >> Also what is =E2=80=98currentEpoch.tmp=E2=80=99 ? I=E2=80=99m not sure= if it relates to > ZooKeeper. > >> > >> Would you please share full startup logs of the failing node? > >> > >> Regards, > >> Andor > >> > >> > >> > >> > >>> On 2019. Aug 23., at 18:53, Debraj Manna > >> wrote: > >>> > >>> Can someone answer by below query? > >>> > >>> I am getting confused after going through ZOOKEEPER-1653 > >>> and > >> ZOOKEEPER-2354 > >>> . The issues > say > >> it > >>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in > >> 3.4.13 > >>> also. Can someone let me know if the issue is present in 3.4.13 also? > >>> > >>> > >>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, > >>> wrote: > >>> > >>>> With the other two zookeeper servers running I stopped the zookeeper > in > >>>> the broken node and the deleted all the contents inside > >> /var/lib/zookeeper/version-2 > >>>> and started the zookeeper back on the node. It is running fine now a= nd > >> got > >>>> all the data from the other servers. > >>>> > >>>> I am getting confused after going through ZOOKEEPER-1653 > >>>> and > >> ZOOKEEPER-2354 > >>>> . The issues > say > >>>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue = in > >>>> 3.4.13 also. Can someone let me know if the issue is present in 3.4.= 13 > >> also? > >>>> > >>>> > >>>> > >>>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna < > subharaj.manna@gmail.com> > >>>> wrote: > >>>> > >>>>> Thanks for replying. > >>>>> > >>>>> What is the recommended way to remove a node and delete all data fr= om > >> it > >>>>> and make it start fresh? > >>>>> > >>>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, > >>>>> wrote: > >>>>> > >>>>>> Hello, > >>>>>> Sorry for so late reply. > >>>>>> If you have 3 servers you can nuke the broken one and make it star= t > >> from > >>>>>> scratch, it will join the cluster and then recover data from the > other > >>>>>> servers > >>>>>> > >>>>>> Try it in a staging env, not in production > >>>>>> > >>>>>> Enrico > >>>>>> > >>>>>> Il mar 20 ago 2019, 20:30 Debraj Manna > ha > >>>>>> scritto: > >>>>>> > >>>>>>> The same has been asked in stackoverflow > >>>>>>> < > >>>>>>> > >>>>>> > >> > https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-= epoch-is-older-than-the-last-zxid > >>>>>>>> > >>>>>>> also. But no response there also. > >>>>>>> > >>>>>>> Anyone any thoughts on this one? > >>>>>>> > >>>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna < > >> subharaj.manna@gmail.com > >>>>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Posted wrong Jira link. I meant > >>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354. Can > someone > >>>>>> let > >>>>>>> me > >>>>>>>> know what is the recommended way to recover the node? > >>>>>>>> > >>>>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat > >>>>>> acceptedEpoch > >>>>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat > >>>>>> currentEpoch > >>>>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat > >>>>>>> currentEpoch.tmp > >>>>>>>> 8support@platform2 > >>>>>>>> > >>>>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna < > >>>>>> subharaj.manna@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi > >>>>>>>>> > >>>>>>>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13. > >> Sometimes > >>>>>>>>> after reboot of machine zookeeper is not starting and I am seei= ng > >>>>>> the > >>>>>>> below > >>>>>>>>> errors in logs. > >>>>>>>>> > >>>>>>>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-165= 3 > . > >>>>>> Can > >>>>>>>>> someone let me if this is fixed in 3.4.13 or not as I can see t= he > >>>>>> issue > >>>>>>>>> still open? Also can somone suggest what is the recommended way > to > >>>>>>> recover > >>>>>>>>> the set-up ? > >>>>>>>>> > >>>>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] = - > >>>>>> Unable > >>>>>>>>> to load database on disk > >>>>>>>>> java.io.IOException: The current epoch, 7, is older than the la= st > >>>>>> zxid, > >>>>>>>>> 34359738370 > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.jav= a:674) > >>>>>>>>> at > >>>>>>>>> > >>>>>> > >> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:63= 5) > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPee= rMain.java:170) > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(Quorum= PeerMain.java:114) > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.jav= a:81) > >>>>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@9= 2 > ] > >> - > >>>>>>>>> Unexpected exception, exiting abnormally > >>>>>>>>> java.lang.RuntimeException: Unable to run quorum server > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.jav= a:693) > >>>>>>>>> at > >>>>>>>>> > >>>>>> > >> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:63= 5) > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPee= rMain.java:170) > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(Quorum= PeerMain.java:114) > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.jav= a:81) > >>>>>>>>> Caused by: java.io.IOException: The current epoch, 7, is older > than > >>>>>> the > >>>>>>>>> last zxid, 34359738370 > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.jav= a:674) > >>>>>>>>> ... 4 more---- > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >> > >> > > --0000000000007b9327059118364a--