curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: CURATOR-3.0 tests
Date Wed, 01 Jun 2016 00:44:50 GMT
I’ll try to address this as part of CURATOR-333

> On May 31, 2016, at 7:08 PM, Cameron McKenzie <mckenzie.cam@gmail.com> wrote:
> 
> Maybe we need to look at some way of providing a hook for tests to wait
> reliably for asynch tasks to finish?
> 
> The latest round of tests ran OK. One test failed on an unrelated thing
> (ConnectionLoss), but this appears to be a transient thing as it's worked
> ok the next time around.
> 
> I will start getting a release together. Thanks for you help with the
> updated tests.
> cheers
> 
> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <jordan@jordanzimmerman.com
>> wrote:
> 
>> The problem is in-flight watchers and async background calls. There’s no
>> way to cancel these and they can take time to occur - even after a recipe
>> instance is closed.
>> 
>> -Jordan
>> 
>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <mckenzie.cam@gmail.com>
>> wrote:
>>> 
>>> Ok, running it again now.
>>> 
>>> Is the problem that the watcher clean up for the recipes is done
>>> asynchronously after they are closed?
>>> 
>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com
>>>> wrote:
>>> 
>>>> OK - please try now. I added a loop in the “no watchers” checker. If
>> there
>>>> are remaining watchers, it will sleep a bit and try again.
>>>> 
>>>> -Jordan
>>>> 
>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <mckenzie.cam@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Looks like these failures are intermittent. Running them directly in
>>>>> Eclipse they seem to be passing. I will run the whole thing again in
>> the
>>>>> morning and see how it goes.
>>>>> 
>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> There are still 2 tests failing for me:
>>>>>> 
>>>>>> FAILURE! - in
>>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>> 
>>>> 
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>> java.lang.AssertionError: One or more child watchers are still
>>>> registered:
>>>>>> [/test]
>>>>>> at
>>>>>> 
>>>> 
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>> at
>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>> 
>>>>>> FAILURE! - in
>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>> 
>>>> 
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>> at
>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>> 
>>>>>> Failed tests:
>>>>>> 
>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more child
>>>>>> watchers are still registered: [/test]
>>>>>> Run 2: PASS
>>>>>> 
>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
but
>>>>>> found [false]
>>>>>> 
>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com
>>>>>>> wrote:
>>>>>> 
>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against
that,
>>>> and
>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>> cheers
>>>>>>> 
>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>> 
>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged
yet. I
>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>> 
>>>>>>>> -jordan
>>>>>>>> 
>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> I'm still seeing 6 failed tests that seem related to
the same stuff
>>>>>>>> after
>>>>>>>>> merging your fix:
>>>>>>>>> 
>>>>>>>>> Failed tests:
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
child
>>>> watchers
>>>>>>>>> are still registered: [/test]
>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
child
>>>> watchers
>>>>>>>>> are still registered: [/test]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>> Run 1:
>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>> Run 2:
>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
more child
>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
more child
>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>> 
>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
[true]
>> but
>>>>>>>>> found [false]
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>> Run 1: PASS
>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One
or more
>> data
>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
watchers are
>>>>>>>> still
>>>>>>>>> registered: [/count]
>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
watchers are
>>>>>>>> still
>>>>>>>>> registered: [/count]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>> 
>>>>>>>>>> I see the problem. The fix is not simple though so
I’ll spend some
>>>>>>>> time on
>>>>>>>>>> it. The TL;DR is that exists watchers are still supposed
to get
>> set
>>>>>>>> when
>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
handling it.
>>>> But,
>>>>>>>>>> while I was looking at the code I realized there
are some
>>>> significant
>>>>>>>>>> additional problems. Curator, here, is trying to
mirror what
>>>>>>>> ZooKeeper does
>>>>>>>>>> internally which is insanely complicated. In hindsight,
the whole
>> ZK
>>>>>>>>>> watcher mechanism should’ve been decoupled from
the mutator APIs.
>>>>>>>> But, of
>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>> 
>>>>>>>>>> -Jordan
>>>>>>>>>> 
>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie
<
>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>> 
>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
consistently on the
>>>> 3.0
>>>>>>>>>>> branch. It appears that this is actually potentially
a bug in the
>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
I've had a
>> quick
>>>>>>>> look
>>>>>>>>>>> through, but I haven't dived in in any detail.
It's the
>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've
got time, can you
>>>>>>>> have a
>>>>>>>>>>> look? If not, let me know and I'll do some more
digging.
>>>>>>>>>>> 
>>>>>>>>>>> cheers
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie
<
>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>> 
>>>>>>>>>>>> Push the fix to master and merge it into
3.0.
>>>>>>>>>>>> 
>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11
and 3.2 onto Nexus.
>>>>>>>>>>>> cheers
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum
<
>>>> dragonsinth@gmail.com
>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Alright, I have a fix, but it wants to
be applied to both
>> master
>>>>>>>> and
>>>>>>>>>> 3.0.
>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron
McKenzie <
>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0
branch, they are failing
>>>>>>>> there.
>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM,
Scott Blum <
>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Sure, what SHA are they failing
at Cam?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36
AM, Jordan Zimmerman <
>>>>>>>>>>>>>>> jordan@jordanzimmerman.com>
wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35
AM, Cameron McKenzie <
>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Tree cache tests are
still failing. I've tried a few times
>>>> but
>>>>>>>> no
>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I will have a look into
what's going on in the morning.
>> Given
>>>>>>>> that
>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>> may take some messing
about to fix up, do we just want to
>>>> vote
>>>>>>>> on
>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>> separately, as that is
all ready to go?
>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, May 25, 2016
at 5:34 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com>
wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On May 25, 2016,
at 12:37 AM, Cameron McKenzie <
>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I have fixed
up the test case, all good now.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Wed, May 25,
2016 at 1:45 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Looks like
it was introduced with the schema validation
>>>>>>>> stuff.
>>>>>>>>>>>>> It
>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>> an ACL check
prior to the backgrounding call. Because
>> the
>>>>>>>> unit
>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>> bogus ACL
provider it just throws an exception
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>  final String
adjustedPath =
>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>  List<ACL>
aclList = acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>> 
>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>  String returnPath
= null;
>>>>>>>>>>>>>>>>>>>>  if ( backgrounding.inBackground()
)
>>>>>>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>>>>>>      pathInBackground(adjustedPath,
data, givenPath);
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> So, I guess
the answer is to get the test to force a
>>>> failure
>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>> different
way. With the UnhandledErrorListener, the
>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> this only
gets called on backgrounding operations?
>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Wed, May
25, 2016 at 1:39 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Same
on the master branch, but it passes there, so
>> maybe
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>> legitimately
broken the test. Will let you know if I
>> get
>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Wed,
May 25, 2016 at 1:36 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com>
wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Let
me know if you need help.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> It
might be a bad merge. Have you compared it to the
>>>>>>>> master
>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>
On May 24, 2016, at 10:31 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>
wrote:
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
Guys,
>>>>>>>>>>>>>>>>>>>>>>>
There's a test
>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>> that
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>
consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
I can't see how it has ever worked. It seems to try
>> and
>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>
via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
But this ACL provider is called by the
>>>> CreateBuilderImpl
>>>>>>>>>>>>> prior
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>
backgrounding call, which means that the exception
>> that
>>>>>>>> it
>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>
in the main Thread of the unit test. So, it just
>> throws
>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>
UnsupportedOperationException which is propogated up
>>>> the
>>>>>>>>>>>>> stack
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>
point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
So, I will look at fixing this, but I just don't
>>>>>>>> understand
>>>>>>>>>>>>> how
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>
worked?
>>>>>>>>>>>>>>>>>>>>>>>
cheers
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message