zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fangmin Lv (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election
Date Fri, 14 Sep 2018 00:19:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614183#comment-16614183
] 

Fangmin Lv edited comment on ZOOKEEPER-2845 at 9/14/18 12:18 AM:
-----------------------------------------------------------------

[~revans2] sorry to get back to this lately, I was in parental leave and totally missed this
thread (my girl was born on Jan 25, so was busy dealing with the new challenges there :) )

I'm revisiting my opening PR today and came across this one.

Checked your fix, looks nice and simple!

There was one thing I thought which might be a problem but actually it won't be a problem
anymore with ZOOKEEPER-2678 you made last time. The thing I was thinking is in [ZooKeeperServer.processTxn|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1213] it
didn't add itself to commit log in ZKDatabase, which will leave a hole in commit logs if we
apply txns directly to DataTree during DIFF sync, which in turn could cause data inconsistency
if it became leader. But we're not doing this anymore with ZOOKEEPER-2678, so it's fine.

Our internal patch is a little bit heavier and complexity, we may change to use this simpler
solution as well. Thanks again for moving this forward! 


was (Author: lvfangmin):
[~revans2] sorry to get back to this lately, I was in parental leave and totally missed this
thread (my girl was born on Jan 25, so was busy dealing with the new challenges there :) )

I'm revisiting my opening PR today and came across this one.

Checked your fix, looks nice and simple!

There was one thing I thought which might be a problem but actually it won't be a problem
anymore with ZOOKEEPER-2678 you made last time. The thing I was thinking is in [ZooKeeperServer.processTxn(TxnHeader,
Record)](https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1213)
it didn't add itself to commit log in ZKDatabase, which will leave a hole in commit logs if
we apply txns directly to DataTree during DIFF sync, which in turn could cause data inconsistency
if it became leader. But we're not doing this anymore with ZOOKEEPER-2678, so it's fine.

Our internal patch is a little bit heavier and complexity, we may change to use this simpler
solution as well. Thanks again for moving this forward! 

> Data inconsistency issue due to retain database in leader election
> ------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2845
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.10, 3.5.3, 3.6.0
>            Reporter: Fangmin Lv
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 3.5.4, 3.6.0, 3.4.12
>
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time during leader
election. In ZooKeeper ensemble, it's possible that the snapshot is ahead of txn file (due
to slow disk on the server, etc), or the txn file is ahead of snapshot due to no commit message
being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will be drained
during shutdown, the snapshot and txn file will keep consistent before leader election happening,
so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have data inconsistent
issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, and C is
leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out to the followers,
A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with leader B
with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which includes T1,
now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by doing consensus
between snapshot and txn files before leader election, will submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message