zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3144) Potential ephemeral nodes inconsistent due to global session inconsistent with fuzzy snapshot
Date Fri, 14 Sep 2018 22:56:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615483#comment-16615483
] 

Hudson commented on ZOOKEEPER-3144:
-----------------------------------

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #191 (See [https://builds.apache.org/job/ZooKeeper-trunk/191/])
ZOOKEEPER-3144: Fix potential ephemeral nodes inconsistent due to global (hanm: rev b58791016424e662c816e2253de96f3771f5d301)
* (edit) src/java/test/org/apache/zookeeper/server/quorum/FuzzySnapshotRelatedTest.java
* (edit) src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java


> Potential ephemeral nodes inconsistent due to global session inconsistent with fuzzy
snapshot
> ---------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3144
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3144
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.4, 3.6.0, 3.4.13
>            Reporter: Fangmin Lv
>            Assignee: Fangmin Lv
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.6.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Found this issue recently when checking another prod issue, the problem is that the current
code will update lastProcessedZxid before it's actually making change for the global sessions
in the DataTree.
>  
> In case there is a snapshot taking in progress, and there is a small time stall between
set lastProcessedZxid and update the session in DataTree due to reasons like thread context
switch or GC, etc, then it's possible the lastProcessedZxid is actually set to the future
which doesn't include the global session change (add or remove).
>  
> When reload this snapshot and it's txns, it will replay txns from lastProcessedZxid +
1, so it won't create the global session anymore, which could cause data inconsistent.
>  
> When global sessions are inconsistent, it might have ephemeral inconsistent as well,
since the leader will delete all the ephemerals locally if there is no global sessions associated
with it, and if someone have snapshot sync with it then that server will not have that ephemeral
as well, but others will. It will also have global session renew issue for that problematic
session.
>  
> The same issue exist for the closeSession txn, we need to move these global session
update logic before processTxn, so the lastProcessedZxid will not miss the global session
here.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message