zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "maoling (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3231) Purge task may lost data when we have many invalid snapshots.
Date Tue, 08 Jan 2019 12:22:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737071#comment-16737071

maoling commented on ZOOKEEPER-3231:


good issue.yep!
the data-loss situation can only happen when the retained count of snapshots were all invalid(very
unfortunately,little probability) and at that time,zk server took any new snapshots.
the specific source codes about the *restore* can be found in:

>  Purge task may lost data when we have many invalid snapshots.
> --------------------------------------------------------------
>                 Key: ZOOKEEPER-3231
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3231
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.4, 3.4.13
>            Reporter: Jiafu Jiang
>            Priority: Major
> I read the ZooKeeper source code, and I find the purge task use FileTxnSnapLog#findNRecentSnapshots
to find snapshots, but the method does not check whether the snapshots are valid.
> Consider a worse case, a ZooKeeper server may have many invalid snapshots, and when a
purge task begins, it will use the zxid in the last snapshot's name to purge old snapshots
and transaction logs, then we may lost data. 
> I think we should use FileSnap#findNValidSnapshots(int) instead of FileSnap#findNRecentSnapshots
in FileTxnSnapLog#findNRecentSnapshots, but I am not sure.

This message was sent by Atlassian JIRA

View raw message