zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2846) Leader follower sync with on disk txns can possibly leads to data inconsistency
Date Tue, 18 Jul 2017 19:38:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092059#comment-16092059
] 

ASF GitHub Bot commented on ZOOKEEPER-2846:
-------------------------------------------

GitHub user lvfangmin opened a pull request:

    https://github.com/apache/zookeeper/pull/314

    [ZOOKEEPER-2846][Test] Leader follower sync with on disk txns can possibly leads to data
inconsistency

    This is only the test case used to reproduce the issue.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lvfangmin/zookeeper ZOOKEEPER-2846-TEST

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #314
    
----
commit c2a1ec8f989b9f799f5880a92730b75ef86164b9
Author: Fangmin Lyu <allenlyu@fb.com>
Date:   2017-07-18T19:21:02Z

    add test case to check data inconsistency issue when using on-disk txn sync

----


> Leader follower sync with on disk txns can possibly leads to data inconsistency
> -------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2846
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2846
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.10, 3.5.3, 3.6.0
>            Reporter: Fangmin Lv
>            Priority: Critical
>
> On disk txn sync could cause data inconsistency if the current leader just had a snap
sync before it became leader, and then having diff sync with its followers may synced the
txns gap on disk. Here is scenario: 
> Let's say S0 - S3 are followers, and S4 is leader at the beginning:
> 1. Stop S2 and send one more request
> 2. Stop S3 and send more requests to the quorum to let S3 have a snap sync with S4 when
it started up
> 3. Stop S4 and S3 became the new leader
> 4. Start S2 and had a diff sync with S3, now there are gaps in S2
> Attached the test case to verify the issue. Currently, there is no efficient way to check
the gap in txn files is a real gap or due to Epoch change. We need to add that support, but
before that, it would be safer to disable the on disk txn leader-follower sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message