zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fangmin Lv (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ZOOKEEPER-3145) Potential watch missing issue due to stale pzxid when replaying CloseSession txn with fuzzy snapshhot
Date Tue, 11 Sep 2018 23:48:00 GMT
Fangmin Lv created ZOOKEEPER-3145:
-------------------------------------

             Summary: Potential watch missing issue due to stale pzxid when replaying CloseSession
txn with fuzzy snapshhot
                 Key: ZOOKEEPER-3145
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3145
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.4.13, 3.5.4, 3.6.0
            Reporter: Fangmin Lv
            Assignee: Fangmin Lv
             Fix For: 3.6.0


This is another issue I found recently, we haven't seen this problem on prod (or maybe we
don't notice).

 
Currently, the CloseSession is not idempotent, executing the CloseSession twice won't get
the same result.
 
The problem is that closeSession will only check what's the ephemeral nodes associated with
that session bases on current states. Nodes deleted during taking fuzzy snapshot won't be
deleted again when replay the txn.
 
This looks fine, since it's already gone, but there is problem with the pzxid of the parent
node. Snapshot is taken fuzzily, so it's possible that the parent had been serialized while
the nodes are being deleted when executing the closeSession Txn. The pzxid will not be updated
in the snapshot when replaying the closeSession txn, because doesn't know what's the paths
being deleted, so it won't patch the pzxid like what we did in the deleteNode ZOOKEEPER-3125.
 
The inconsistent pzxid will lead to potential watch notification missing when client reconnect
with setWatches because of the staleness. 
 
This JIRA is going to fix those issues by adding the CloseSessionTxn, it will record all
those nodes being deleted in that CloseSession txn, so that we know which nodes to update when replaying
the txn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message