zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
Date Fri, 29 Dec 2017 19:12:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306470#comment-16306470
] 

ASF GitHub Bot commented on ZOOKEEPER-1621:
-------------------------------------------

GitHub user abhishekrai opened a pull request:

    https://github.com/apache/zookeeper/pull/439

    ZOOKEEPER-1621: Delete and skip txn log with incomplete header

    Based on the patch by Michi Mutsuzaki.
    
    When Zookeeper server encounters a txn log with incomplete header,
    the old behavior was to crash due to the resulting EOFException.
    The new behavior is catch the exception and skip the txn log.
    
    Additionally, the txn log is deleted to ensure that it does not
    influence future loads/PurgeTxnLog in believing that this is
    the only txn log before the following snapshot that they need to
    load/retain.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/abhishekrai/zookeeper ZOOKEEPER-1621

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #439
    
----
commit 6b457a069ccdb01e1ee77537b02db80f3005f5b1
Author: Abhishek Rai <abhishekrai@...>
Date:   2017-12-29T17:38:52Z

    ZOOKEEPER-1621: Delete and skip txn log with incomplete header
    
    Based on the patch by Michi Mutsuzaki.
    
    When Zookeeper server encounters a txn log with incomplete header,
    the old behavior was to crash due to the resulting EOFException.
    The new behavior is catch the exception and skip the txn log.
    
    Additionally, the txn log is deleted to ensure that it does not
    influence future loads/PurgeTxnLog in believing that this is
    the only txn log before the following snapshot that they need to
    load/retain.

----


> ZooKeeper does not recover from crash when disk was full
> --------------------------------------------------------
>
>                 Key: ZOOKEEPER-1621
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>         Environment: Ubuntu 12.04, Amazon EC2 instance
>            Reporter: David Arthur
>            Assignee: Michi Mutsuzaki
>             Fix For: 3.5.4, 3.6.0
>
>         Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got the following
exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable
error, exiting
> java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:282)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>         at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
>         at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
>         at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
>         at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
>         at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception,
exiting abnormally
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:375)
>         at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
>         at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
>         at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
>         at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
>         at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
>         at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
>         at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
>         at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
>         at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
>         at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>         at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
>         at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
>         at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
>         at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
>         at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
>         at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
>         at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
>         at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to avoid such
situations. Is this not the case?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message