curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Blum <dragonsi...@gmail.com>
Subject Re: PLEASE REVIEW - Major re-work of Watcher wrappers
Date Sun, 07 Feb 2016 18:55:06 GMT
Are we using a new zookeeper?  Or did something change with our
implementation of KillSession.kill()?

Or maybe there's a timing issue with Curator's ConnectionStateManager State
change: LOST?  I don't understand how we could get a LOST event without the
ephemeral node attached to that session having disappeared?

On Sun, Feb 7, 2016 at 10:59 AM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> I don’t know if anything changed in ZooKeeper itself. I know that the
> connection states changed in Curator, but Curator now tests both the old
> mode and the new mode and they both fail here.
>
> -JZ
>
> On Feb 7, 2016, at 1:06 AM, Scott Blum <dragonsinth@gmail.com> wrote:
>
> I need to analyze this a bit deeper, but what I'm seeing on the 3.0 branch
> is that the ephemeral node /test/me created in testKilledSession() really
> isn't disappearing when it should.
>
> After the session loss and the reconnect, /test still shows 2 children
> [foo, me] and /test/me still returns a node.
>
> Any idea why the timing here would have changed?
>
> On Sat, Feb 6, 2016 at 1:41 PM, Scott Blum <dragonsinth@gmail.com> wrote:
>
>> https://issues.apache.org/jira/browse/CURATOR-302
>>
>> I need to trace through what's really going on under the hood rather than
>> band-aid the test.  Should be able to in next couple of days.
>>
>> On Sat, Feb 6, 2016 at 7:43 AM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com> wrote:
>>
>>> Any update on this? I think you should create a Jira for it.
>>>
>>> -Jordan
>>>
>>>
>>> On Feb 5, 2016, at 12:30 PM, Scott Blum <dragonsinth@gmail.com> wrote:
>>>
>>> BTW, this test passes on master... so it's some kind of 3.0 vs. master
>>> issue.  I think I'm going to just have to dump in a ton of log messages and
>>> see what differs.
>>>
>>> On Fri, Feb 5, 2016 at 12:25 PM, Jordan Zimmerman <
>>> jordan@jordanzimmerman.com> wrote:
>>>
>>>> OK - please create a new Issue in Jira for this.
>>>>
>>>> -Jordan
>>>>
>>>> On Feb 5, 2016, at 12:24 PM, Scott Blum <dragonsinth@gmail.com> wrote:
>>>>
>>>> BTW: this is broken on CURATOR-3.0 as well, so it appears to have been
>>>> broken for a while.  Maybe I'll have to git bisect...
>>>>
>>>> On Fri, Feb 5, 2016 at 12:22 PM, Scott Blum <dragonsinth@gmail.com>
>>>> wrote:
>>>>
>>>>> Okay, so I looked into this for a bit, and I hit kind of a wall.  I
>>>>> think there is a legit bug/race in TreeCache, and the following patch
>>>>> *should* remedy:
>>>>>
>>>>> diff --git
>>>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>>>> index df4403c..a4a022b 100644
>>>>> ---
>>>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>>>> +++
>>>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>>>> @@ -303,7 +303,6 @@ public class TreeCache implements Closeable
>>>>>          void wasDeleted() throws Exception
>>>>>          {
>>>>>              ChildData oldChildData = childData.getAndSet(null);
>>>>> -
>>>>>  client.watches().remove(this).ofType(WatcherType.Any).locally().inBackground().forPath(path);
>>>>>              ConcurrentMap<String, TreeNode> childMap =
>>>>> children.getAndSet(null);
>>>>>              if ( childMap != null )
>>>>>              {
>>>>> @@ -807,8 +806,16 @@ public class TreeCache implements Closeable
>>>>>          case RECONNECTED:
>>>>>              try
>>>>>              {
>>>>> +                outstandingOps.incrementAndGet();
>>>>>                  root.wasReconnected();
>>>>>
>>>>>  publishEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED);
>>>>> +                if ( outstandingOps.decrementAndGet() == 0 )
>>>>> +                {
>>>>> +                    if ( isInitialized.compareAndSet(false, true) )
>>>>> +                    {
>>>>> +                        publishEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>> +                    }
>>>>> +                }
>>>>>              }
>>>>>              catch ( Exception e )
>>>>>              {
>>>>>
>>>>> That should guarantee that the initialized event gets deferred until
>>>>> all outstanding refreshes finish.. but it's not.  Something seems to
have
>>>>> changed under the hood in how background events are getting sent to
>>>>> TreeCache, and I don't really understand it yet.  And running the debugger
>>>>> seems to affect the timing, like something racy is going on. :(
>>>>>
>>>>>
>>>>> On Fri, Feb 5, 2016 at 11:57 AM, Scott Blum <dragonsinth@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Ok, that is kind of weird.  I'll take a look.
>>>>>>
>>>>>> On Fri, Feb 5, 2016 at 4:58 AM, Jordan Zimmerman <
>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>
>>>>>>> No, sorry. The last few lines of the test currently are:
>>>>>>>
>>>>>>>
>>>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", "data".getBytes());
>>>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>>>>
>>>>>>> This fails. But, if I switch them it works:
>>>>>>>
>>>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>>>>
>>>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", "data".getBytes());
>>>>>>>
>>>>>>> On Feb 5, 2016, at 2:57 AM, Scott Blum <dragonsinth@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> So you end up with 2 initialized events?
>>>>>>>
>>>>>>> You mean this?
>>>>>>>
>>>>>>>          assertEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED);
>>>>>>> +        assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>>>>          assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me",
>>>>>>> "data".getBytes());
>>>>>>>          assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>>>>
>>>>>>> Seems weird if there are two, but I can help look.
>>>>>>>
>>>>>>> On Thu, Feb 4, 2016 at 10:48 PM, Jordan Zimmerman <
>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>
>>>>>>>> Hey Scott,
>>>>>>>>
>>>>>>>> In this branch, TestTreeCache.testKilledSession() is failing
at:
>>>>>>>>
>>>>>>>>         assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me",
>>>>>>>> "data".getBytes());
>>>>>>>>
>>>>>>>> However, if I change the two asserts to:
>>>>>>>>
>>>>>>>>         assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>>>>>         assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me",
>>>>>>>> "data".getBytes());
>>>>>>>>
>>>>>>>> it works. Does that make any sense?
>>>>>>>>
>>>>>>>> -Jordan
>>>>>>>>
>>>>>>>> > On Feb 4, 2016, at 9:23 PM, Jordan Zimmerman <
>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>> >
>>>>>>>> > Devs,
>>>>>>>> >
>>>>>>>> > In trying to fix the bad log message "Failed to find
watcher”
>>>>>>>> (which turns out to be a ZK client issue), I realize that
the
>>>>>>>> NamespaceWatcher and WatcherWrapper stuff could be improved.
I’m still
>>>>>>>> working on getting all tests to pass but I’d appreciate
more sets of eyes
>>>>>>>> on this change. Please review carefully if you can.
>>>>>>>> >
>>>>>>>> > https://github.com/apache/curator/pull/131
>>>>>>>> >
>>>>>>>> > -Jordan
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message