zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andor Molnar <an...@apache.org>
Subject Re: The current epoch, 7, is older than the last zxid, 8589935882
Date Thu, 29 Aug 2019 04:42:55 GMT
Thanks for the info, I’m still looking.
So, this is an Ubuntu packaged version of ZooKeeper.

Andor



> On 2019. Aug 27., at 14:13, Debraj Manna <subharaj.manna@gmail.com> wrote:
> 
> No I don't see the updatingEpoch file in /var/lib/zookeeper/version-2
> 
> I started zookeeper by adding set -x in /usr/bin/zookeeper-server I can see
> zookeeper is getting started with 3.4.13 as shown below . The complete logs
> are placed in the below gist
> 
> https://gist.github.com/debraj-manna/509ec3d497016c4a249ee2b8dace05d9
> 
> nohup java -Dzookeeper.datadir.autocreate=false
> -Dzookeeper.log.dir=/var/log/zookeeper
> -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp
> '/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/lib/zookeeper/bin/../lib/jline-2.11.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.13.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/etc/zookeeper/conf::/etc/zookeeper/conf:/usr/lib/zookeeper/*:/usr/lib/zookeeper/lib/*'
> -Dzookeeper.log.threshold=INFO -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=false
> org.apache.zookeeper.server.quorum.QuorumPeerMain
> /etc/zookeeper/conf/zoo.cfg
> + sleep 1
> + echo STARTED
> STARTED
> 
> The content of zookeeper.log is placed in the below gist after the start
> 
> https://gist.github.com/debraj-manna/9800c5bef32837c62bdfb324c0589ad6
> 
> Let me know if you need any more logs.
> 
> On Mon, Aug 26, 2019 at 9:21 PM Andor Molnar <andor@apache.org> wrote:
> 
>> I confirmed that the fix is included in 3.4.13. That’s why I asked if you
>> can see ‘updatingEpoch’ file in the data folder.
>> 
>> I don’t think the issue is not related, but I want to make sure that
>> you’re running the right version by verifying the beginning of ZK logs.
>> 
>> Andor
>> 
>> 
>> 
>>> On 2019. Aug 26., at 13:43, Debraj Manna <subharaj.manna@gmail.com>
>> wrote:
>>> 
>>> Below is the content of currentEpoch.tmp
>>> 
>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>> currentEpoch.tmp
>>> 8support@platform2
>>> 
>>> Starting zookeeper logs are rolled over as the issue was there for some
>>> time. Will the current log with the node in this state help? Btw why do
>> you
>>> think this issue may not be related to zookeeper?
>>> 
>>> 
>>> 
>>> On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar <andor@apache.org> wrote:
>>> 
>>>> Hi Debraj,
>>>> 
>>>> The fix should be in all 3.4 versions from 3.4.6 onward, including
>> 3.4.13.
>>>> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ?
>>>> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to
>> ZooKeeper.
>>>> 
>>>> Would you please share full startup logs of the failing node?
>>>> 
>>>> Regards,
>>>> Andor
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 2019. Aug 23., at 18:53, Debraj Manna <subharaj.manna@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Can someone answer by below query?
>>>>> 
>>>>> I am getting confused after going through ZOOKEEPER-1653
>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
>>>> ZOOKEEPER-2354
>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues
>> say
>>>> it
>>>>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
>>>> 3.4.13
>>>>> also. Can someone let me know if the issue is present in 3.4.13 also?
>>>>> 
>>>>> 
>>>>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <subharaj.manna@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> With the other two zookeeper servers running I stopped the zookeeper
>> in
>>>>>> the broken node and the deleted all the contents inside
>>>> /var/lib/zookeeper/version-2
>>>>>> and started the zookeeper back on the node. It is running fine now
and
>>>> got
>>>>>> all the data from the other servers.
>>>>>> 
>>>>>> I am getting confused after going through ZOOKEEPER-1653
>>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
>>>> ZOOKEEPER-2354
>>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The
issues
>> say
>>>>>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue
in
>>>>>> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13
>>>> also?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna <
>> subharaj.manna@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Thanks for replying.
>>>>>>> 
>>>>>>> What is the recommended way to remove a node and delete all data
from
>>>> it
>>>>>>> and make it start fresh?
>>>>>>> 
>>>>>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <eolivelli@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> Sorry for so late reply.
>>>>>>>> If you have 3 servers you can nuke the broken one and make
it start
>>>> from
>>>>>>>> scratch, it will join the cluster and then recover data from
the
>> other
>>>>>>>> servers
>>>>>>>> 
>>>>>>>> Try it in a staging env, not in production
>>>>>>>> 
>>>>>>>> Enrico
>>>>>>>> 
>>>>>>>> Il mar 20 ago 2019, 20:30 Debraj Manna <subharaj.manna@gmail.com>
>> ha
>>>>>>>> scritto:
>>>>>>>> 
>>>>>>>>> The same has been asked in stackoverflow
>>>>>>>>> <
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
>>>>>>>>>> 
>>>>>>>>> also. But no response there also.
>>>>>>>>> 
>>>>>>>>> Anyone any thoughts on this one?
>>>>>>>>> 
>>>>>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <
>>>> subharaj.manna@gmail.com
>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Posted wrong Jira link. I meant
>>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354.
 Can
>> someone
>>>>>>>> let
>>>>>>>>> me
>>>>>>>>>> know what is the recommended way to recover the node?
>>>>>>>>>> 
>>>>>>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo
cat
>>>>>>>> acceptedEpoch
>>>>>>>>>> 8support@platform2:/var/lib/zookeeper/version-2$
sudo cat
>>>>>>>> currentEpoch
>>>>>>>>>> 7support@platform2:/var/lib/zookeeper/version-2$
sudo cat
>>>>>>>>> currentEpoch.tmp
>>>>>>>>>> 8support@platform2
>>>>>>>>>> 
>>>>>>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
>>>>>>>> subharaj.manna@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi
>>>>>>>>>>> 
>>>>>>>>>>> I am using a zookeeper ensemble of 3 nodes running
3.4.13.
>>>> Sometimes
>>>>>>>>>>> after reboot of machine zookeeper is not starting
and I am seeing
>>>>>>>> the
>>>>>>>>> below
>>>>>>>>>>> errors in logs.
>>>>>>>>>>> 
>>>>>>>>>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653
>> .
>>>>>>>> Can
>>>>>>>>>>> someone let me if this is fixed in 3.4.13 or
not as I can see the
>>>>>>>> issue
>>>>>>>>>>> still open? Also can somone suggest what is the
recommended way
>> to
>>>>>>>>> recover
>>>>>>>>>>> the set-up ?
>>>>>>>>>>> 
>>>>>>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692]
-
>>>>>>>> Unable
>>>>>>>>>>> to load database on disk
>>>>>>>>>>> java.io.IOException: The current epoch, 7, is
older than the last
>>>>>>>> zxid,
>>>>>>>>>>> 34359738370
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>>>>>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92
>> ]
>>>> -
>>>>>>>>>>> Unexpected exception, exiting abnormally
>>>>>>>>>>> java.lang.RuntimeException: Unable to run quorum
server
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>>>>>>>>>> Caused by: java.io.IOException: The current epoch,
7, is older
>> than
>>>>>>>> the
>>>>>>>>>>> last zxid, 34359738370
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>>>>>>>>>> ... 4 more----
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message