curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randgalt <...@git.apache.org>
Subject [GitHub] curator pull request: [CURATOR-324] Test for event ordering in bot...
Date Thu, 19 May 2016 22:20:51 GMT
Github user Randgalt commented on the pull request:

    https://github.com/apache/curator/pull/148#issuecomment-220468433
  
    I did some further debugging and the problem has something to do with the same node being
deleted/updated/recreated in quick succession. I'm concerned that the node states are handle
by individual TreeNode instances. Look at this log (filtered for node "/root/0"):
    
    ```
    2133 processResult: @Node1 CuratorEventImpl{type=GET_DATA, resultCode=0, path='/root/0',
name='null', children=null, context=null, stat=5135,5153,1463694250431,1463694250434,2,0,0,0,3,0,5135,
data=[110, 101, 119], watchedEvent=null, aclList=null} [main-EventThread]
    2133 publishEvent: @Node1 /root/0 NODE_ADDED [main-EventThread]
    2134 processResult: @Node1 CuratorEventImpl{type=CHILDREN, resultCode=0, path='/root/0',
name='null', children=[], context=null, stat=5135,5153,1463694250431,1463694250434,2,0,0,0,3,0,5135
    2135 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0, path='/root',
name='null', children=[3, 0, 7, 9, 8], context=null, stat=2,2,1463694248754,1463694248754,0,1729,0,0,9,5,5154
    2136 process: @Node1 WatchedEvent state:SyncConnected type:NodeDataChanged path:/root/0
[main-EventThread]
    2136 process: @Node1 WatchedEvent state:SyncConnected type:NodeDeleted path:/root/0 [main-EventThread]
    2136 publishEvent: @Node1 /root/0 NODE_REMOVED [main-EventThread]
    2154 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0, path='/root',
name='null', children=[0, 6, 5, 4, 8], context=null, stat=2,2,1463694248754,1463694248754,0,1765,0,0,9,5,5270
    2155 processResult: @Node1 CuratorEventImpl{type=GET_DATA, resultCode=0, path='/root/0',
name='null', children=null, context=null, stat=5268,5268,1463694250454,1463694250454,0,0,0,0,9,0,5268
    2155 publishEvent: @Node1 /root/0 NODE_ADDED [main-EventThread]
    2167 processResult: @Node2 CuratorEventImpl{type=GET_DATA, resultCode=-101, path='/root/0',
name='null', children=null, context=null, stat=null, data=null, watchedEvent=null, aclList=null}
[main-EventThread]
    2167 wasDeleted @Node2 /root/0 PENDING [main-EventThread]
    2167 processResult: @Node2 CuratorEventImpl{type=CHILDREN, resultCode=-101, path='/root/0',
name='null', children=[], context=null, stat=null, data=null, watchedEvent=null, aclList=null}
[main-EventThread]
    2167 wasDeleted @Node2 /root/0 DEAD [main-EventThread]
    2183 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0, path='/root',
name='null', children=[3, 7, 5, 9, 8], context=null, stat=2,2,1463694248754,1463694248754,0,1805,0,0,9,5,5380
    2233 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0, path='/root',
name='null', children=[0, 7, 5, 4], context=null, stat=2,2,1463694248754,1463694248754,0,1842,0,0,9,4,5484
    2259 processResult: @Node3 CuratorEventImpl{type=GET_DATA, resultCode=-101, path='/root/0',
name='null', children=null, context=null, stat=null, data=null, watchedEvent=null, aclList=null}
[main-EventThread]
    2259 wasDeleted @Node3 /root/0 PENDING [main-EventThread]
    2259 processResult: @Node3 CuratorEventImpl{type=CHILDREN, resultCode=-101, path='/root/0',
name='null', children=[], context=null, stat=null, data=null, watchedEvent=null, aclList=null}
[main-EventThread]
    2260 wasDeleted @Node3 /root/0 DEAD [main-EventThread]
    2261 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0, path='/root',
name='null', children=[7, 6, 9], context=null, stat=2,2,1463694248754,1463694248754,0,1875,0,0,9,3,5604
    2273 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0, path='/root',
name='null', children=[1, 0, 5, 9], context=null, stat=2,2,1463694248754,1463694248754,0,1906,0,0,9,4,5706
    2282 processResult: @Node4 CuratorEventImpl{type=GET_DATA, resultCode=0, path='/root/0',
name='null', children=null, context=null, stat=5797,5811,1463694250589,1463694250591,1,0,0,0,3,0,5797
    2282 publishEvent:@Node4 /root/0 NODE_ADDED [main-EventThread]
    ```
    
    * The first column is time
    * The second column is the operation
    * The third is the TreeNode doing the operation (I made the name human readable) 
    * The last bit is the event - wasDeleted means that wasDeleted() did not publish NODE_REMOVED
because the node state was not LIVE
    
    The problem here is the two NODE_ADDED publishes at 2155 and 2282. I believe there is
a missing NODE_REMOVED event at 2260.
    
    Note, at time 2136 we get watcher process() NodeDataChanged immediately followed by a
NodeDeleted on @Node1's watcher. 
    @Node1 then publishes NODE_REMOVED (correctly) at 2136.
    
    At 2154, @root gets the result of calling getChildren. I assume that "/root/0" was recreated
and @root's watcher got notified. Note that
    "0" is in the children list. So, this will cause a new node to get created and its wasCreated()
called. At 2155, the old @Node1 gets the
    result of GET_DATA. This causes a false NODE_ADDED to get generated. @Node1 is officially
dead and should not be reporting this.
    
    I believe the sequence of events is: Just prior to 2136, 1 test code thread changes "/root/0"
and another deletes it and another recreates it almost immediately.
    @Node1's watcher is triggered for NodeDataChanged and begins an async call to getData(),
it then receives NodeDeleted BEFORE the async getData() occurs on the server.
    The node then gets recreated. The getData() finally executes on the server and the result
is returned such that @Node1 misses the node deletion.
    
    Anyway, I have a potential fix for now. In processResults() GET_DATA case, add:
    
    ```
    if ( nodeState.get() == NodeState.DEAD )
    {
        break;
    }
    ```
    
    This fixes the problem. Longer term, we should consider moving node management into the
main TreeCache instance and not the individual node objects.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message