curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: CURATOR-3.0 tests
Date Tue, 31 May 2016 15:29:14 GMT
The main problem is that there are background tasks, etc. that fire at unpredictable times
and it causes the “no more watchers” test to fail. Let me try something and see if it
helps. 

-Jordan

> On May 31, 2016, at 1:33 AM, Cameron McKenzie <mckenzie.cam@gmail.com> wrote:
> 
> Looks like these failures are intermittent. Running them directly in
> Eclipse they seem to be passing. I will run the whole thing again in the
> morning and see how it goes.
> 
> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <mckenzie.cam@gmail.com>
> wrote:
> 
>> There are still 2 tests failing for me:
>> 
>> FAILURE! - in
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>> Time elapsed: 17.488 sec  <<< FAILURE!
>> java.lang.AssertionError: One or more child watchers are still registered:
>> [/test]
>> at
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>> at
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>> 
>> FAILURE! - in
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>> Time elapsed: 96.641 sec  <<< FAILURE!
>> java.lang.AssertionError: expected [true] but found [false]
>> at org.testng.Assert.fail(Assert.java:94)
>> at org.testng.Assert.failNotEquals(Assert.java:494)
>> at org.testng.Assert.assertTrue(Assert.java:42)
>> at org.testng.Assert.assertTrue(Assert.java:52)
>> at
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>> 
>> Failed tests:
>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>  Run 1: TestPathChildrenCache.testKilledSession:707 One or more child
>> watchers are still registered: [/test]
>>  Run 2: PASS
>> 
>>  TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but
>> found [false]
>> 
>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>> 
>> 
>> 
>> 
>> 
>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <mckenzie.cam@gmail.com
>>> wrote:
>> 
>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against that, and
>>> if it's all good will merge into CURATOR-3.0
>>> cheers
>>> 
>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>> jordan@jordanzimmerman.com> wrote:
>>> 
>>>> Actually - I don’t remember if branch CURATOR-332 is merged yet. I
>>>> made/pushed my changes in CURATOR-332
>>>> 
>>>> -jordan
>>>> 
>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <mckenzie.cam@gmail.com>
>>>> wrote:
>>>>> 
>>>>> I'm still seeing 6 failed tests that seem related to the same stuff
>>>> after
>>>>> merging your fix:
>>>>> 
>>>>> Failed tests:
>>>>> 
>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child watchers
>>>>> are still registered: [/test]
>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child watchers
>>>>> are still registered: [/test]
>>>>> 
>>>>> 
>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>> Run 1: TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>> One or more child watchers are still registered: [/test]
>>>>> Run 2: TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>> One or more child watchers are still registered: [/test]
>>>>> 
>>>>> 
>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more child
>>>>> watchers are still registered: [/one/two/three]
>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more child
>>>>> watchers are still registered: [/one/two/three]
>>>>> 
>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but
>>>>> found [false]
>>>>> 
>>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>> Run 1: PASS
>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or more data
>>>>> watchers are still registered: [/count]
>>>>> 
>>>>> 
>>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>> Run 1: TestSharedCount.testSimple:174 One or more data watchers are
>>>> still
>>>>> registered: [/count]
>>>>> Run 2: TestSharedCount.testSimple:174 One or more data watchers are
>>>> still
>>>>> registered: [/count]
>>>>> 
>>>>> 
>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>> 
>>>>> 
>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>> jordan@jordanzimmerman.com> wrote:
>>>>> 
>>>>>> I see the problem. The fix is not simple though so I’ll spend some
>>>> time on
>>>>>> it. The TL;DR is that exists watchers are still supposed to get set
>>>> when
>>>>>> there is a KeeperException.NoNode and the code isn’t handling it.
But,
>>>>>> while I was looking at the code I realized there are some significant
>>>>>> additional problems. Curator, here, is trying to mirror what
>>>> ZooKeeper does
>>>>>> internally which is insanely complicated. In hindsight, the whole
ZK
>>>>>> watcher mechanism should’ve been decoupled from the mutator APIs.
>>>> But, of
>>>>>> course, that’s easy for me to say now.
>>>>>> 
>>>>>> -Jordan
>>>>>> 
>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Thanks Scott,
>>>>>>> Those tests are now passing for me.
>>>>>>> 
>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently on
the 3.0
>>>>>>> branch. It appears that this is actually potentially a bug in
the
>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've had a
quick
>>>> look
>>>>>>> through, but I haven't dived in in any detail. It's the
>>>>>>> WatcherRemovalManager stuff I think. If you've got time, can
you
>>>> have a
>>>>>>> look? If not, let me know and I'll do some more digging.
>>>>>>> 
>>>>>>> cheers
>>>>>>> 
>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>> mckenzie.cam@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Scott.
>>>>>>>> 
>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>> 
>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto
Nexus.
>>>>>>>> cheers
>>>>>>>> 
>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <dragonsinth@gmail.com
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Alright, I have a fix, but it wants to be applied to
both master
>>>> and
>>>>>> 3.0.
>>>>>>>>> Where should I push the fix?
>>>>>>>>> 
>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks Scott,
>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they
are failing
>>>> there.
>>>>>>>>>> cheers
>>>>>>>>>> 
>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>> dragonsinth@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman
<
>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>> 
>>>>>>>>>>>> -Jordan
>>>>>>>>>>>> 
>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron
McKenzie <
>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Tree cache tests are still failing. I've
tried a few times but
>>>> no
>>>>>>>>>> love:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>> actual 6
>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I will have a look into what's going
on in the morning. Given
>>>> that
>>>>>>>>>>> these
>>>>>>>>>>>>> may take some messing about to fix up,
do we just want to vote
>>>> on
>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>> cheers
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan
Zimmerman <
>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM,
Cameron McKenzie <
>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I have fixed up the test case,
all good now.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45
PM, Cameron McKenzie <
>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Looks like it was introduced
with the schema validation
>>>> stuff.
>>>>>>>>> It
>>>>>>>>>>> now
>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>> an ACL check prior to the
backgrounding call. Because the
>>>> unit
>>>>>>>>>> test
>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>> bogus ACL provider it just
throws an exception
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>    final String adjustedPath
=
>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>    List<ACL> aclList
= acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>    String returnPath = null;
>>>>>>>>>>>>>>>>    if ( backgrounding.inBackground()
)
>>>>>>>>>>>>>>>>    {
>>>>>>>>>>>>>>>>        pathInBackground(adjustedPath,
data, givenPath);
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> So, I guess the answer is
to get the test to force a failure
>>>>>>>>> in a
>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener,
the
>>>>>>>>> expectation is
>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> this only gets called on
backgrounding operations?
>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39
PM, Cameron McKenzie <
>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Same on the master branch,
but it passes there, so maybe
>>>>>>>>>> something
>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>> legitimately broken the
test. Will let you know if I get
>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, May 25, 2016
at 1:36 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com>
wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Let me know if you
need help.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> It might be a bad
merge. Have you compared it to the
>>>> master
>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On May 24,
2016, at 10:31 PM, Cameron McKenzie <
>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>> There's a test
TestFrameworkBackground:testErrorListener
>>>>>>>>> that
>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>> consistently
on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I can't see how
it has ever worked. It seems to try and
>>>>>>>>> provoke
>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>> via a bad ACL
provider.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> But this ACL
provider is called by the CreateBuilderImpl
>>>>>>>>> prior
>>>>>>>>>> to
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> backgrounding
call, which means that the exception that
>>>> it
>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>> in the main Thread
of the unit test. So, it just throws
>>>> an
>>>>>>>>>>>>>>>>>>> UnsupportedOperationException
which is propogated up the
>>>>>>>>> stack
>>>>>>>>>> at
>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>> point the unit
test fails.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> So, I will look
at fixing this, but I just don't
>>>> understand
>>>>>>>>> how
>>>>>>>>>>> it
>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 


Mime
View raw message