zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Maslyakov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2332) Zookeeper failed to start for empty txn log
Date Wed, 19 Oct 2016 20:00:58 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589719#comment-15589719
] 

Sergey Maslyakov commented on ZOOKEEPER-2332:
---------------------------------------------

I've seen this problem happen on a system that ran out of disk space due to other application
filling up the disk. The entry for the transaction log file was created on the file system
but ZooKeeper was not able to write anything into it. After the system was rebooted and disk
space was released, ZooKeeper failed to start.

I think this is a two-fold problem.
# On one hand, ZooKeeper should not be creating corrupted log or snapshot files.
# On the other hand, it should not explode with an unhandled exception if it does come across
an invalid log file.

Before opening a snapshot file, ZooKeeper does some quick and inexpensive validation and rejects
the corrupted snapshots. It does not validate the log files and does not handle read/parse
errors in case if came across a corrupted log file.

The defect is reproducible on the heads of master, branch-3.5, and branch-3.4.

> Zookeeper failed to start for empty txn log
> -------------------------------------------
>
>                 Key: ZOOKEEPER-2332
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2332
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.6
>            Reporter: Liu Shaohui
>            Assignee: Liu Shaohui
>            Priority: Critical
>             Fix For: 3.6.0
>
>         Attachments: ZOOKEEPER-2332-v001.diff
>
>
> We found that the zookeeper server with version 3.4.6 failed to start for there is a
empty txn log in log dir.  
> I think we should skip the empty log file during restoring the datatree. 
> Any suggestion?
> {code}
> 2015-11-27 19:16:16,887 [myid:] - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception,
exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:576)
> at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:595)
> at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:561)
> at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:643)
> at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158)
> at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:272)
> at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:399)
> at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
> at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:113)
> at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message