curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: CURATOR-3.0 tests
Date Tue, 31 May 2016 23:12:50 GMT
The problem is in-flight watchers and async background calls. There’s no way to cancel these
and they can take time to occur - even after a recipe instance is closed.

-Jordan

> On May 31, 2016, at 5:11 PM, Cameron McKenzie <mckenzie.cam@gmail.com> wrote:
> 
> Ok, running it again now.
> 
> Is the problem that the watcher clean up for the recipes is done
> asynchronously after they are closed?
> 
> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <jordan@jordanzimmerman.com
>> wrote:
> 
>> OK - please try now. I added a loop in the “no watchers” checker. If there
>> are remaining watchers, it will sleep a bit and try again.
>> 
>> -Jordan
>> 
>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <mckenzie.cam@gmail.com>
>> wrote:
>>> 
>>> Looks like these failures are intermittent. Running them directly in
>>> Eclipse they seem to be passing. I will run the whole thing again in the
>>> morning and see how it goes.
>>> 
>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>>> wrote:
>>> 
>>>> There are still 2 tests failing for me:
>>>> 
>>>> FAILURE! - in
>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>> 
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>> java.lang.AssertionError: One or more child watchers are still
>> registered:
>>>> [/test]
>>>> at
>>>> 
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>> at
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>> 
>>>> FAILURE! - in
>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>> 
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>> java.lang.AssertionError: expected [true] but found [false]
>>>> at org.testng.Assert.fail(Assert.java:94)
>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>> at
>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>> 
>>>> Failed tests:
>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more child
>>>> watchers are still registered: [/test]
>>>> Run 2: PASS
>>>> 
>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but
>>>> found [false]
>>>> 
>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>> mckenzie.cam@gmail.com
>>>>> wrote:
>>>> 
>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against that,
>> and
>>>>> if it's all good will merge into CURATOR-3.0
>>>>> cheers
>>>>> 
>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>> jordan@jordanzimmerman.com> wrote:
>>>>> 
>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged yet.
I
>>>>>> made/pushed my changes in CURATOR-332
>>>>>> 
>>>>>> -jordan
>>>>>> 
>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> I'm still seeing 6 failed tests that seem related to the same
stuff
>>>>>> after
>>>>>>> merging your fix:
>>>>>>> 
>>>>>>> Failed tests:
>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child
>> watchers
>>>>>>> are still registered: [/test]
>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child
>> watchers
>>>>>>> are still registered: [/test]
>>>>>>> 
>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>> Run 1:
>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>> Run 2:
>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>> 
>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more child
>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more child
>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>> 
>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
but
>>>>>>> found [false]
>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>> Run 1: PASS
>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or more
data
>>>>>>> watchers are still registered: [/count]
>>>>>>> 
>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data watchers
are
>>>>>> still
>>>>>>> registered: [/count]
>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data watchers
are
>>>>>> still
>>>>>>> registered: [/count]
>>>>>>> 
>>>>>>> 
>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>> 
>>>>>>>> I see the problem. The fix is not simple though so I’ll
spend some
>>>>>> time on
>>>>>>>> it. The TL;DR is that exists watchers are still supposed
to get set
>>>>>> when
>>>>>>>> there is a KeeperException.NoNode and the code isn’t handling
it.
>> But,
>>>>>>>> while I was looking at the code I realized there are some
>> significant
>>>>>>>> additional problems. Curator, here, is trying to mirror what
>>>>>> ZooKeeper does
>>>>>>>> internally which is insanely complicated. In hindsight, the
whole ZK
>>>>>>>> watcher mechanism should’ve been decoupled from the mutator
APIs.
>>>>>> But, of
>>>>>>>> course, that’s easy for me to say now.
>>>>>>>> 
>>>>>>>> -Jordan
>>>>>>>> 
>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>> mckenzie.cam@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks Scott,
>>>>>>>>> Those tests are now passing for me.
>>>>>>>>> 
>>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently
on the
>> 3.0
>>>>>>>>> branch. It appears that this is actually potentially
a bug in the
>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've
had a quick
>>>>>> look
>>>>>>>>> through, but I haven't dived in in any detail. It's the
>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time,
can you
>>>>>> have a
>>>>>>>>> look? If not, let me know and I'll do some more digging.
>>>>>>>>> 
>>>>>>>>> cheers
>>>>>>>>> 
>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks Scott.
>>>>>>>>>> 
>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>> 
>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and
3.2 onto Nexus.
>>>>>>>>>> cheers
>>>>>>>>>> 
>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>> dragonsinth@gmail.com
>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Alright, I have a fix, but it wants to be applied
to both master
>>>>>> and
>>>>>>>> 3.0.
>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie
<
>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch,
they are failing
>>>>>> there.
>>>>>>>>>>>> cheers
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum
<
>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan
Zimmerman <
>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM,
Cameron McKenzie <
>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Tree cache tests are still failing.
I've tried a few times
>> but
>>>>>> no
>>>>>>>>>>>> love:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I will have a look into what's
going on in the morning. Given
>>>>>> that
>>>>>>>>>>>>> these
>>>>>>>>>>>>>>> may take some messing about to
fix up, do we just want to
>> vote
>>>>>> on
>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>> separately, as that is all ready
to go?
>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34
PM, Jordan Zimmerman <
>>>>>>>>>>>>>>> jordan@jordanzimmerman.com>
wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37
AM, Cameron McKenzie <
>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I have fixed up the test
case, all good now.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, May 25, 2016
at 1:45 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Looks like it was
introduced with the schema validation
>>>>>> stuff.
>>>>>>>>>>> It
>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>> an ACL check prior
to the backgrounding call. Because the
>>>>>> unit
>>>>>>>>>>>> test
>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>> bogus ACL provider
it just throws an exception
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>   final String adjustedPath
=
>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>   List<ACL>
aclList = acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>   String returnPath
= null;
>>>>>>>>>>>>>>>>>>   if ( backgrounding.inBackground()
)
>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>       pathInBackground(adjustedPath,
data, givenPath);
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> So, I guess the answer
is to get the test to force a
>> failure
>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>> different way. With
the UnhandledErrorListener, the
>>>>>>>>>>> expectation is
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> this only gets called
on backgrounding operations?
>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016
at 1:39 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Same on the master
branch, but it passes there, so maybe
>>>>>>>>>>>> something
>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>> legitimately
broken the test. Will let you know if I get
>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Wed, May 25,
2016 at 1:36 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com>
wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Let me know
if you need help.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> It might
be a bad merge. Have you compared it to the
>>>>>> master
>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On
May 24, 2016, at 10:31 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>> There's
a test
>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>> that
>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>> consistently
on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I can't
see how it has ever worked. It seems to try and
>>>>>>>>>>> provoke
>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>> via a
bad ACL provider.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> But this
ACL provider is called by the
>> CreateBuilderImpl
>>>>>>>>>>> prior
>>>>>>>>>>>> to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> backgrounding
call, which means that the exception that
>>>>>> it
>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>> in the
main Thread of the unit test. So, it just throws
>>>>>> an
>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException
which is propogated up
>> the
>>>>>>>>>>> stack
>>>>>>>>>>>> at
>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>> point
the unit test fails.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> So, I
will look at fixing this, but I just don't
>>>>>> understand
>>>>>>>>>>> how
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 


Mime
View raw message