zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3145) Potential watch missing issue due to stale pzxid when replaying CloseSession txn with fuzzy snapshot
Date Wed, 12 Sep 2018 19:10:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612617#comment-16612617

Hadoop QA commented on ZOOKEEPER-3145:

-1 overall.  GitHub Pull Request  Build

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 13 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2157//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2157//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2157//console

This message is automatically generated.

> Potential watch missing issue due to stale pzxid when replaying CloseSession txn with
fuzzy snapshot
> ----------------------------------------------------------------------------------------------------
>                 Key: ZOOKEEPER-3145
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3145
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.4, 3.6.0, 3.4.13
>            Reporter: Fangmin Lv
>            Assignee: Fangmin Lv
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.6.0
>          Time Spent: 10m
>  Remaining Estimate: 0h
> This is another issue I found recently, we haven't seen this problem on prod (or maybe
we don't notice).
> Currently, the CloseSession is not idempotent, executing the CloseSession twice won't
get the same result.
> The problem is that closeSession will only check what's the ephemeral nodes associated
with that session bases on current states. Nodes deleted during taking fuzzy snapshot won't
be deleted again when replay the txn.
> This looks fine, since it's already gone, but there is problem with the pzxid of the
parent node. Snapshot is taken fuzzily, so it's possible that the parent had been serialized
while the nodes are being deleted when executing the closeSession Txn. The pzxid will not
be updated in the snapshot when replaying the closeSession txn, because doesn't know what's
the paths being deleted, so it won't patch the pzxid like what we did in the deleteNode ZOOKEEPER-3125.
> The inconsistent pzxid will lead to potential watch notification missing when client
reconnect with setWatches because of the staleness. 
> This JIRA is going to fix those issues by adding the CloseSessionTxn, it will record
all those nodes being deleted in that CloseSession txn, so that we know which nodes to update when replaying
the txn.

This message was sent by Atlassian JIRA

View raw message