zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debraj Manna <subharaj.ma...@gmail.com>
Subject Re: The current epoch, 7, is older than the last zxid, 8589935882
Date Tue, 27 Aug 2019 12:13:56 GMT
No I don't see the updatingEpoch file in /var/lib/zookeeper/version-2

I started zookeeper by adding set -x in /usr/bin/zookeeper-server I can see
zookeeper is getting started with 3.4.13 as shown below . The complete logs
are placed in the below gist

https://gist.github.com/debraj-manna/509ec3d497016c4a249ee2b8dace05d9

nohup java -Dzookeeper.datadir.autocreate=false
-Dzookeeper.log.dir=/var/log/zookeeper
-Dzookeeper.root.logger=INFO,ROLLINGFILE -cp
'/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/lib/zookeeper/bin/../lib/jline-2.11.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.13.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/etc/zookeeper/conf::/etc/zookeeper/conf:/usr/lib/zookeeper/*:/usr/lib/zookeeper/lib/*'
-Dzookeeper.log.threshold=INFO -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.local.only=false
org.apache.zookeeper.server.quorum.QuorumPeerMain
/etc/zookeeper/conf/zoo.cfg
+ sleep 1
+ echo STARTED
STARTED

The content of zookeeper.log is placed in the below gist after the start

https://gist.github.com/debraj-manna/9800c5bef32837c62bdfb324c0589ad6

Let me know if you need any more logs.

On Mon, Aug 26, 2019 at 9:21 PM Andor Molnar <andor@apache.org> wrote:

> I confirmed that the fix is included in 3.4.13. That’s why I asked if you
> can see ‘updatingEpoch’ file in the data folder.
>
> I don’t think the issue is not related, but I want to make sure that
> you’re running the right version by verifying the beginning of ZK logs.
>
> Andor
>
>
>
> > On 2019. Aug 26., at 13:43, Debraj Manna <subharaj.manna@gmail.com>
> wrote:
> >
> > Below is the content of currentEpoch.tmp
> >
> > support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
> > 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
> > 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> currentEpoch.tmp
> > 8support@platform2
> >
> > Starting zookeeper logs are rolled over as the issue was there for some
> > time. Will the current log with the node in this state help? Btw why do
> you
> > think this issue may not be related to zookeeper?
> >
> >
> >
> > On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar <andor@apache.org> wrote:
> >
> >> Hi Debraj,
> >>
> >> The fix should be in all 3.4 versions from 3.4.6 onward, including
> 3.4.13.
> >> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ?
> >> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to
> ZooKeeper.
> >>
> >> Would you please share full startup logs of the failing node?
> >>
> >> Regards,
> >> Andor
> >>
> >>
> >>
> >>
> >>> On 2019. Aug 23., at 18:53, Debraj Manna <subharaj.manna@gmail.com>
> >> wrote:
> >>>
> >>> Can someone answer by below query?
> >>>
> >>> I am getting confused after going through ZOOKEEPER-1653
> >>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
> >> ZOOKEEPER-2354
> >>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues
> say
> >> it
> >>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
> >> 3.4.13
> >>> also. Can someone let me know if the issue is present in 3.4.13 also?
> >>>
> >>>
> >>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <subharaj.manna@gmail.com>
> >>> wrote:
> >>>
> >>>> With the other two zookeeper servers running I stopped the zookeeper
> in
> >>>> the broken node and the deleted all the contents inside
> >> /var/lib/zookeeper/version-2
> >>>> and started the zookeeper back on the node. It is running fine now and
> >> got
> >>>> all the data from the other servers.
> >>>>
> >>>> I am getting confused after going through ZOOKEEPER-1653
> >>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
> >> ZOOKEEPER-2354
> >>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues
> say
> >>>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue
in
> >>>> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13
> >> also?
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna <
> subharaj.manna@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Thanks for replying.
> >>>>>
> >>>>> What is the recommended way to remove a node and delete all data
from
> >> it
> >>>>> and make it start fresh?
> >>>>>
> >>>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <eolivelli@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>> Sorry for so late reply.
> >>>>>> If you have 3 servers you can nuke the broken one and make it
start
> >> from
> >>>>>> scratch, it will join the cluster and then recover data from
the
> other
> >>>>>> servers
> >>>>>>
> >>>>>> Try it in a staging env, not in production
> >>>>>>
> >>>>>> Enrico
> >>>>>>
> >>>>>> Il mar 20 ago 2019, 20:30 Debraj Manna <subharaj.manna@gmail.com>
> ha
> >>>>>> scritto:
> >>>>>>
> >>>>>>> The same has been asked in stackoverflow
> >>>>>>> <
> >>>>>>>
> >>>>>>
> >>
> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
> >>>>>>>>
> >>>>>>> also. But no response there also.
> >>>>>>>
> >>>>>>> Anyone any thoughts on this one?
> >>>>>>>
> >>>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <
> >> subharaj.manna@gmail.com
> >>>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Posted wrong Jira link. I meant
> >>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354.
 Can
> someone
> >>>>>> let
> >>>>>>> me
> >>>>>>>> know what is the recommended way to recover the node?
> >>>>>>>>
> >>>>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo
cat
> >>>>>> acceptedEpoch
> >>>>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo
cat
> >>>>>> currentEpoch
> >>>>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo
cat
> >>>>>>> currentEpoch.tmp
> >>>>>>>> 8support@platform2
> >>>>>>>>
> >>>>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
> >>>>>> subharaj.manna@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi
> >>>>>>>>>
> >>>>>>>>> I am using a zookeeper ensemble of 3 nodes running
3.4.13.
> >> Sometimes
> >>>>>>>>> after reboot of machine zookeeper is not starting
and I am seeing
> >>>>>> the
> >>>>>>> below
> >>>>>>>>> errors in logs.
> >>>>>>>>>
> >>>>>>>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653
> .
> >>>>>> Can
> >>>>>>>>> someone let me if this is fixed in 3.4.13 or not
as I can see the
> >>>>>> issue
> >>>>>>>>> still open? Also can somone suggest what is the
recommended way
> to
> >>>>>>> recover
> >>>>>>>>> the set-up ?
> >>>>>>>>>
> >>>>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692]
-
> >>>>>> Unable
> >>>>>>>>> to load database on disk
> >>>>>>>>> java.io.IOException: The current epoch, 7, is older
than the last
> >>>>>> zxid,
> >>>>>>>>> 34359738370
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>
> >> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> >>>>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92
> ]
> >> -
> >>>>>>>>> Unexpected exception, exiting abnormally
> >>>>>>>>> java.lang.RuntimeException: Unable to run quorum
server
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>
> >> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> >>>>>>>>> Caused by: java.io.IOException: The current epoch,
7, is older
> than
> >>>>>> the
> >>>>>>>>> last zxid, 34359738370
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> >>>>>>>>> ... 4 more----
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message