I confirmed that the fix is included in 3.4.13. That’s why I asked if you can see ‘updatingEpoch’
file in the data folder.
I don’t think the issue is not related, but I want to make sure that you’re running the
right version by verifying the beginning of ZK logs.
Andor
> On 2019. Aug 26., at 13:43, Debraj Manna <subharaj.manna@gmail.com> wrote:
>
> Below is the content of currentEpoch.tmp
>
> support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch.tmp
> 8support@platform2
>
> Starting zookeeper logs are rolled over as the issue was there for some
> time. Will the current log with the node in this state help? Btw why do you
> think this issue may not be related to zookeeper?
>
>
>
> On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar <andor@apache.org> wrote:
>
>> Hi Debraj,
>>
>> The fix should be in all 3.4 versions from 3.4.6 onward, including 3.4.13.
>> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ?
>> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to ZooKeeper.
>>
>> Would you please share full startup logs of the failing node?
>>
>> Regards,
>> Andor
>>
>>
>>
>>
>>> On 2019. Aug 23., at 18:53, Debraj Manna <subharaj.manna@gmail.com>
>> wrote:
>>>
>>> Can someone answer by below query?
>>>
>>> I am getting confused after going through ZOOKEEPER-1653
>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
>> ZOOKEEPER-2354
>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say
>> it
>>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
>> 3.4.13
>>> also. Can someone let me know if the issue is present in 3.4.13 also?
>>>
>>>
>>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <subharaj.manna@gmail.com>
>>> wrote:
>>>
>>>> With the other two zookeeper servers running I stopped the zookeeper in
>>>> the broken node and the deleted all the contents inside
>> /var/lib/zookeeper/version-2
>>>> and started the zookeeper back on the node. It is running fine now and
>> got
>>>> all the data from the other servers.
>>>>
>>>> I am getting confused after going through ZOOKEEPER-1653
>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
>> ZOOKEEPER-2354
>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues
say
>>>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
>>>> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13
>> also?
>>>>
>>>>
>>>>
>>>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna <subharaj.manna@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for replying.
>>>>>
>>>>> What is the recommended way to remove a node and delete all data from
>> it
>>>>> and make it start fresh?
>>>>>
>>>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <eolivelli@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>> Sorry for so late reply.
>>>>>> If you have 3 servers you can nuke the broken one and make it start
>> from
>>>>>> scratch, it will join the cluster and then recover data from the
other
>>>>>> servers
>>>>>>
>>>>>> Try it in a staging env, not in production
>>>>>>
>>>>>> Enrico
>>>>>>
>>>>>> Il mar 20 ago 2019, 20:30 Debraj Manna <subharaj.manna@gmail.com>
ha
>>>>>> scritto:
>>>>>>
>>>>>>> The same has been asked in stackoverflow
>>>>>>> <
>>>>>>>
>>>>>>
>> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
>>>>>>>>
>>>>>>> also. But no response there also.
>>>>>>>
>>>>>>> Anyone any thoughts on this one?
>>>>>>>
>>>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <
>> subharaj.manna@gmail.com
>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Posted wrong Jira link. I meant
>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354. Can
someone
>>>>>> let
>>>>>>> me
>>>>>>>> know what is the recommended way to recover the node?
>>>>>>>>
>>>>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>>>>> acceptedEpoch
>>>>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>>>>> currentEpoch
>>>>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>>>>>> currentEpoch.tmp
>>>>>>>> 8support@platform2
>>>>>>>>
>>>>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
>>>>>> subharaj.manna@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13.
>> Sometimes
>>>>>>>>> after reboot of machine zookeeper is not starting and
I am seeing
>>>>>> the
>>>>>>> below
>>>>>>>>> errors in logs.
>>>>>>>>>
>>>>>>>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653
.
>>>>>> Can
>>>>>>>>> someone let me if this is fixed in 3.4.13 or not as I
can see the
>>>>>> issue
>>>>>>>>> still open? Also can somone suggest what is the recommended
way to
>>>>>>> recover
>>>>>>>>> the set-up ?
>>>>>>>>>
>>>>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692]
-
>>>>>> Unable
>>>>>>>>> to load database on disk
>>>>>>>>> java.io.IOException: The current epoch, 7, is older than
the last
>>>>>> zxid,
>>>>>>>>> 34359738370
>>>>>>>>> at
>>>>>>>>>
>>>>>>>
>>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>>>>>>>> at
>>>>>>>>>
>>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>>>>>>>> at
>>>>>>>>>
>>>>>>>
>>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>>>>>>>> at
>>>>>>>>>
>>>>>>>
>>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>>>>>>>> at
>>>>>>>>>
>>>>>>>
>>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>>>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92]
>> -
>>>>>>>>> Unexpected exception, exiting abnormally
>>>>>>>>> java.lang.RuntimeException: Unable to run quorum server
>>>>>>>>> at
>>>>>>>>>
>>>>>>>
>>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
>>>>>>>>> at
>>>>>>>>>
>>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>>>>>>>> at
>>>>>>>>>
>>>>>>>
>>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>>>>>>>> at
>>>>>>>>>
>>>>>>>
>>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>>>>>>>> at
>>>>>>>>>
>>>>>>>
>>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>>>>>>>> Caused by: java.io.IOException: The current epoch, 7,
is older than
>>>>>> the
>>>>>>>>> last zxid, 34359738370
>>>>>>>>> at
>>>>>>>>>
>>>>>>>
>>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>>>>>>>> ... 4 more----
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>
>>
|