zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "maoling (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.
Date Wed, 26 Dec 2018 16:34:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729095#comment-16729095
] 

maoling commented on ZOOKEEPER-3220:
------------------------------------

[~jiangjiafu]

--->"*In my environment, the save method returned successfully, that means no exception
had been thrown. But, the data was not in disk! That's the problem I want to report!*"

1.why this situation happend? The disk is full? 
 snapshot does not call *fsync* may be the answer.
 Do you see some logs about *FileTxnSnapLog#save* at that time?
2.Even if this situation that the size of snapshot is 0 could not cause data inconsistency.
 because when ZooKeeper server restarted again,the invalid snapshots will be skiped,if no
any invalid snapshot,
 the leader can do *SNAP* to sync with the follower

> The snapshot is not saved to disk and may cause data inconsistency.
> -------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3220
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.12, 3.4.13
>            Reporter: Jiafu Jiang
>            Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has been successfully
saved to disk. But ZooKeeper server does not call fsync to make sure that a snapshot has been successfully
saved, which may cause potential problems. Since a close to a file description does not
make sure that data is written to disk, see [http://man7.org/linux/man-pages/man2/close.2.html#notes] for
more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data inconsistency.
Here is my example, which is also a real problem I have ever met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are saved
to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it began to
synchronize data with the leader. The leader sent a snapshot(records from log 1 ~ log Y) to
zk1, zk1 then saved the snapshot to local disk by calling the method ZooKeeperServer.takeSnapshot.
But unfortunately, when the method returned, the snapshot data was not saved to disk yet.
In fact the snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the leader.
Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  saved to log file. With
fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be used, therefore
zk1 recovered using the log files. But the records from log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message