zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2994) Tool required to recover log and snapshot entries with CRC errors
Date Wed, 18 Apr 2018 13:34:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442495#comment-16442495
] 

ASF GitHub Bot commented on ZOOKEEPER-2994:
-------------------------------------------

Github user nkalmar commented on the issue:

    https://github.com/apache/zookeeper/pull/487
  
    I used your updated documentation, and managed to recover a corrupted log file:
    
    bin/zkTxnLogToolkit.sh -d ~/workspace/zookeeper/standalone/version-2/log.1
    ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
    4/9/18 3:13:19 PM CEST session 0x10000ebe13a0000 cxid 0x0 zxid 0x1 createSession 30000
    4/9/18 3:15:21 PM CEST session 0x10000ebe13a0000 cxid 0x0 zxid 0x2 closeSession null
    4/9/18 3:17:41 PM CEST session 0x10000ebe13a0001 cxid 0x0 zxid 0x3 createSession 30000
    4/9/18 3:18:13 PM CEST session 0x10000ebe13a0001 cxid 0x0 zxid 0x4 closeSession null
    EOF reached after 4 txns.
    
    Corrupted log.1 file
    
    bin/zkTxnLogToolkit.sh -d ~/workspace/zookeeper/standalone/version-2/log.1
    ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
    CRC ERROR - 4/10/18 5:12:11 AM CEST session 0x10000ebe13a0000 cxid 0x0 zxid 0x1 createSession
30000
    4/10/18 5:12:11 AM CEST session 0x10000ebe13a0000 cxid 0x0 zxid 0x1 createSession 30000
    4/9/18 3:15:21 PM CEST session 0x10000ebe13a0000 cxid 0x0 zxid 0x2 closeSession null
    CRC ERROR - 4/9/18 3:17:41 PM CEST session 0x10044aa44aaaaaa cxid 0x0 zxid 0x3 createSession
30000
    4/9/18 3:17:41 PM CEST session 0x10044aa44aaaaaa cxid 0x0 zxid 0x3 createSession 30000
    4/9/18 3:18:13 PM CEST session 0x10000ebe13a0001 cxid 0x0 zxid 0x4 closeSession null
    EOF reached after 4 txns.
    
    bin/zkTxnLogToolkit.sh -r ~/workspace/zookeeper/standalone/version-2/log.1
    ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
    CRC ERROR - 4/10/18 5:12:11 AM CEST session 0x10000ebe13a0000 cxid 0x0 zxid 0x1 createSession
30000
    Would you like to fix it (Yes/No/Abort) ? Y
    EOF reached after 4 txns.
    Recovery file /Users/norbertkalmar/workspace/zookeeper/standalone/version-2/log.1.fixed
has been written with 1 fixed CRC error(s)
    
    bin/zkTxnLogToolkit.sh -d ~/workspace/zookeeper/standalone/version-2/log.1.fixed
    ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
    4/9/18 3:13:19 PM CEST session 0x10000ebe13a0000 cxid 0x0 zxid 0x1 createSession 30000
    4/9/18 3:15:21 PM CEST session 0x10000ebe13a0000 cxid 0x0 zxid 0x2 closeSession null
    4/9/18 3:17:41 PM CEST session 0x10044aa44aaaaaa cxid 0x0 zxid 0x3 createSession 30000
    4/9/18 3:18:13 PM CEST session 0x10000ebe13a0001 cxid 0x0 zxid 0x4 closeSession null
    EOF reached after 4 txns.
    
    
    LGTM!


> Tool required to recover log and snapshot entries with CRC errors
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2994
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2994
>             Project: ZooKeeper
>          Issue Type: New Feature
>            Reporter: Andor Molnar
>            Assignee: Andor Molnar
>            Priority: Major
>             Fix For: 3.5.4, 3.6.0, 3.4.13
>
>
> In the even that the zookeeper transaction log or snapshot become corrupted and fail
CRC checks (preventing startup) we should have a mechanism to get the cluster running again.
> Previously we achieved this by loading the broken transaction log with a modified version
of ZK with disabled CRC check and forced it to snapshot.
> It'd very handy to have a tool which can do this for us. LogFormatter and SnapshotFormatter
have already been designed to dump log and snapshot files, it'd be nice to extend their functionality
and add ability for such recovery.
> It has proven that once you end up with the corrupt txn log there is no way to recover
except manually modifying the crc check. That's basically why the tool is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message