curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Blum <dragonsi...@gmail.com>
Subject Re: PLEASE REVIEW - Major re-work of Watcher wrappers
Date Sat, 06 Feb 2016 18:41:58 GMT
https://issues.apache.org/jira/browse/CURATOR-302

I need to trace through what's really going on under the hood rather than
band-aid the test.  Should be able to in next couple of days.

On Sat, Feb 6, 2016 at 7:43 AM, Jordan Zimmerman <jordan@jordanzimmerman.com
> wrote:

> Any update on this? I think you should create a Jira for it.
>
> -Jordan
>
>
> On Feb 5, 2016, at 12:30 PM, Scott Blum <dragonsinth@gmail.com> wrote:
>
> BTW, this test passes on master... so it's some kind of 3.0 vs. master
> issue.  I think I'm going to just have to dump in a ton of log messages and
> see what differs.
>
> On Fri, Feb 5, 2016 at 12:25 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
>> OK - please create a new Issue in Jira for this.
>>
>> -Jordan
>>
>> On Feb 5, 2016, at 12:24 PM, Scott Blum <dragonsinth@gmail.com> wrote:
>>
>> BTW: this is broken on CURATOR-3.0 as well, so it appears to have been
>> broken for a while.  Maybe I'll have to git bisect...
>>
>> On Fri, Feb 5, 2016 at 12:22 PM, Scott Blum <dragonsinth@gmail.com>
>> wrote:
>>
>>> Okay, so I looked into this for a bit, and I hit kind of a wall.  I
>>> think there is a legit bug/race in TreeCache, and the following patch
>>> *should* remedy:
>>>
>>> diff --git
>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>> index df4403c..a4a022b 100644
>>> ---
>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>> +++
>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>> @@ -303,7 +303,6 @@ public class TreeCache implements Closeable
>>>          void wasDeleted() throws Exception
>>>          {
>>>              ChildData oldChildData = childData.getAndSet(null);
>>> -
>>>  client.watches().remove(this).ofType(WatcherType.Any).locally().inBackground().forPath(path);
>>>              ConcurrentMap<String, TreeNode> childMap =
>>> children.getAndSet(null);
>>>              if ( childMap != null )
>>>              {
>>> @@ -807,8 +806,16 @@ public class TreeCache implements Closeable
>>>          case RECONNECTED:
>>>              try
>>>              {
>>> +                outstandingOps.incrementAndGet();
>>>                  root.wasReconnected();
>>>
>>>  publishEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED);
>>> +                if ( outstandingOps.decrementAndGet() == 0 )
>>> +                {
>>> +                    if ( isInitialized.compareAndSet(false, true) )
>>> +                    {
>>> +                        publishEvent(TreeCacheEvent.Type.INITIALIZED);
>>> +                    }
>>> +                }
>>>              }
>>>              catch ( Exception e )
>>>              {
>>>
>>> That should guarantee that the initialized event gets deferred until all
>>> outstanding refreshes finish.. but it's not.  Something seems to have
>>> changed under the hood in how background events are getting sent to
>>> TreeCache, and I don't really understand it yet.  And running the debugger
>>> seems to affect the timing, like something racy is going on. :(
>>>
>>>
>>> On Fri, Feb 5, 2016 at 11:57 AM, Scott Blum <dragonsinth@gmail.com>
>>> wrote:
>>>
>>>> Ok, that is kind of weird.  I'll take a look.
>>>>
>>>> On Fri, Feb 5, 2016 at 4:58 AM, Jordan Zimmerman <
>>>> jordan@jordanzimmerman.com> wrote:
>>>>
>>>>> No, sorry. The last few lines of the test currently are:
>>>>>
>>>>>
>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", "data".getBytes());
>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>>
>>>>> This fails. But, if I switch them it works:
>>>>>
>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>>
>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", "data".getBytes());
>>>>>
>>>>> On Feb 5, 2016, at 2:57 AM, Scott Blum <dragonsinth@gmail.com>
wrote:
>>>>>
>>>>> So you end up with 2 initialized events?
>>>>>
>>>>> You mean this?
>>>>>
>>>>>          assertEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED);
>>>>> +        assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>>          assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me",
>>>>> "data".getBytes());
>>>>>          assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>>
>>>>> Seems weird if there are two, but I can help look.
>>>>>
>>>>> On Thu, Feb 4, 2016 at 10:48 PM, Jordan Zimmerman <
>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>
>>>>>> Hey Scott,
>>>>>>
>>>>>> In this branch, TestTreeCache.testKilledSession() is failing at:
>>>>>>
>>>>>>         assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me",
>>>>>> "data".getBytes());
>>>>>>
>>>>>> However, if I change the two asserts to:
>>>>>>
>>>>>>         assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>>>         assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me",
>>>>>> "data".getBytes());
>>>>>>
>>>>>> it works. Does that make any sense?
>>>>>>
>>>>>> -Jordan
>>>>>>
>>>>>> > On Feb 4, 2016, at 9:23 PM, Jordan Zimmerman <
>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>> >
>>>>>> > Devs,
>>>>>> >
>>>>>> > In trying to fix the bad log message "Failed to find watcher”
>>>>>> (which turns out to be a ZK client issue), I realize that the
>>>>>> NamespaceWatcher and WatcherWrapper stuff could be improved. I’m
still
>>>>>> working on getting all tests to pass but I’d appreciate more sets
of eyes
>>>>>> on this change. Please review carefully if you can.
>>>>>> >
>>>>>> > https://github.com/apache/curator/pull/131
>>>>>> >
>>>>>> > -Jordan
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message