zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Nixon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3082) Fix server snapshot behavior when out of disk space
Date Wed, 01 Aug 2018 19:28:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565860#comment-16565860
] 

Brian Nixon commented on ZOOKEEPER-3082:
----------------------------------------

[~andorm] my (possibly incorrect) read on ZOOKEEPER-1621 is that the issue is related to this
one but not strictly a subset. Here we've removed the possibility of the snapshot side of
recovery being lost during a disk-full event. There, the issue seems to be in ensuring the
transaction log side of recovery is not corrupted by writing empty/incomplete log files. That
issue will continue to be present even with the patch from this file applied.

> Fix server snapshot behavior when out of disk space
> ---------------------------------------------------
>
>                 Key: ZOOKEEPER-3082
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3082
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.6.0, 3.4.12, 3.5.5
>            Reporter: Brian Nixon
>            Assignee: Brian Nixon
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.6.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When the ZK server tries to make a snapshot and the machine is out of disk space, the
snapshot creation fails and throws an IOException. An empty snapshot file is created, (probably
because the server is able to create an entry in the dir) but is not able to write to the
file.
>  
> If snapshot creation fails, the server commits suicide. When it restarts, it will do
so from the last known good snapshot. However, when it tries to make a snapshot again, the
same thing happens. This results in lots of empty snapshot files being created. If eventually
the DataDirCleanupManager garbage collects the good snapshot files then only the empty files
remain. At this point, the server is well and truly screwed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message