zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fangmin Lv (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3104) Potential data inconsistency due to NEWLEADER packet being sent too early during SNAP sync
Date Sat, 11 Aug 2018 19:39:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16577297#comment-16577297

Fangmin Lv commented on ZOOKEEPER-3104:

[~andorm] I think [~breed] is planning to port this to other branch as well. Ben, it should
be trivial to port this from 3.6 to 3.5, let me know if it needs more effort to port to 3.4,
I can send out another patch to 3.4 if it takes more effort (mostly for testing I think).

> Potential data inconsistency due to NEWLEADER packet being sent too early during SNAP
> ------------------------------------------------------------------------------------------
>                 Key: ZOOKEEPER-3104
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3104
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.4, 3.6.0, 3.4.13
>            Reporter: Fangmin Lv
>            Assignee: Fangmin Lv
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.6.0
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
> Currently, in SNAP sync, the leader will start queuing the proposal/commits and the NEWLEADER
packet before sending over the snapshot over wire. So it's possible that the zxid associated
with the snapshot might be higher than all the packets queued before NEWLEADER.
> When the follower received the snapshot, it will apply all the txns queued before NEWLEADER,
which may not cover all the txns up to the zxid in the snapshot. After that, it will write
the snapshot out to disk with the zxid associated with the snapshot. In case the server crashed
after writing this out, when loading the data from disk, it will use zxid of the snapshot
file to sync with leader, and it could cause data inconsistent, because we only replayed partial
of the historical data during previous syncing.
> NEWLEADER packet means the learner now has the correct and almost up to data state as
leader, so it makes more sense to move the NEWLEADER packet after sending over snapshot, and
this is what we did in the fix.
> Besides this, the socket timeout is changed to use smaller sync timeout after received
NEWLEADER ack, in high write traffic ensembles with large snapshot, the follower might be
timed out by leader before finishing sending over those queued txns after writing snapshot
out, which could cause the follower staying in syncing state forever. Move the NEWLEADER packet
after sending over snapshot can avoid this issue as well.

This message was sent by Atlassian JIRA

View raw message