curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cameron McKenzie <mckenzie....@gmail.com>
Subject Re: CURATOR-3.0 tests
Date Wed, 01 Jun 2016 00:08:58 GMT
Maybe we need to look at some way of providing a hook for tests to wait
reliably for asynch tasks to finish?

The latest round of tests ran OK. One test failed on an unrelated thing
(ConnectionLoss), but this appears to be a transient thing as it's worked
ok the next time around.

I will start getting a release together. Thanks for you help with the
updated tests.
cheers

On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <jordan@jordanzimmerman.com
> wrote:

> The problem is in-flight watchers and async background calls. There’s no
> way to cancel these and they can take time to occur - even after a recipe
> instance is closed.
>
> -Jordan
>
> > On May 31, 2016, at 5:11 PM, Cameron McKenzie <mckenzie.cam@gmail.com>
> wrote:
> >
> > Ok, running it again now.
> >
> > Is the problem that the watcher clean up for the recipes is done
> > asynchronously after they are closed?
> >
> > On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
> jordan@jordanzimmerman.com
> >> wrote:
> >
> >> OK - please try now. I added a loop in the “no watchers” checker. If
> there
> >> are remaining watchers, it will sleep a bit and try again.
> >>
> >> -Jordan
> >>
> >>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <mckenzie.cam@gmail.com>
> >> wrote:
> >>>
> >>> Looks like these failures are intermittent. Running them directly in
> >>> Eclipse they seem to be passing. I will run the whole thing again in
> the
> >>> morning and see how it goes.
> >>>
> >>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
> >> mckenzie.cam@gmail.com>
> >>> wrote:
> >>>
> >>>> There are still 2 tests failing for me:
> >>>>
> >>>> FAILURE! - in
> >>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
> >>>>
> >>
> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>> Time elapsed: 17.488 sec  <<< FAILURE!
> >>>> java.lang.AssertionError: One or more child watchers are still
> >> registered:
> >>>> [/test]
> >>>> at
> >>>>
> >>
> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
> >>>> at
> >>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
> >>>>
> >>>> FAILURE! - in
> >>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
> >>>>
> >>
> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
> >>>> Time elapsed: 96.641 sec  <<< FAILURE!
> >>>> java.lang.AssertionError: expected [true] but found [false]
> >>>> at org.testng.Assert.fail(Assert.java:94)
> >>>> at org.testng.Assert.failNotEquals(Assert.java:494)
> >>>> at org.testng.Assert.assertTrue(Assert.java:42)
> >>>> at org.testng.Assert.assertTrue(Assert.java:52)
> >>>> at
> >>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
> >>>>
> >>>> Failed tests:
> >>>>
> >>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more child
> >>>> watchers are still registered: [/test]
> >>>> Run 2: PASS
> >>>>
> >>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but
> >>>> found [false]
> >>>>
> >>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
> >> mckenzie.cam@gmail.com
> >>>>> wrote:
> >>>>
> >>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against
that,
> >> and
> >>>>> if it's all good will merge into CURATOR-3.0
> >>>>> cheers
> >>>>>
> >>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
> >>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>
> >>>>>> Actually - I don’t remember if branch CURATOR-332 is merged
yet. I
> >>>>>> made/pushed my changes in CURATOR-332
> >>>>>>
> >>>>>> -jordan
> >>>>>>
> >>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
> >> mckenzie.cam@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> I'm still seeing 6 failed tests that seem related to the
same stuff
> >>>>>> after
> >>>>>>> merging your fix:
> >>>>>>>
> >>>>>>> Failed tests:
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
child
> >> watchers
> >>>>>>> are still registered: [/test]
> >>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
child
> >> watchers
> >>>>>>> are still registered: [/test]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>> Run 1:
> >> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>> Run 2:
> >> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more
child
> >>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more
child
> >>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>
> >>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
[true]
> but
> >>>>>>> found [false]
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>> Run 1: PASS
> >>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One
or more
> data
> >>>>>>> watchers are still registered: [/count]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data watchers
are
> >>>>>> still
> >>>>>>> registered: [/count]
> >>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data watchers
are
> >>>>>> still
> >>>>>>> registered: [/count]
> >>>>>>>
> >>>>>>>
> >>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
> >>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>
> >>>>>>>> I see the problem. The fix is not simple though so I’ll
spend some
> >>>>>> time on
> >>>>>>>> it. The TL;DR is that exists watchers are still supposed
to get
> set
> >>>>>> when
> >>>>>>>> there is a KeeperException.NoNode and the code isn’t
handling it.
> >> But,
> >>>>>>>> while I was looking at the code I realized there are
some
> >> significant
> >>>>>>>> additional problems. Curator, here, is trying to mirror
what
> >>>>>> ZooKeeper does
> >>>>>>>> internally which is insanely complicated. In hindsight,
the whole
> ZK
> >>>>>>>> watcher mechanism should’ve been decoupled from the
mutator APIs.
> >>>>>> But, of
> >>>>>>>> course, that’s easy for me to say now.
> >>>>>>>>
> >>>>>>>> -Jordan
> >>>>>>>>
> >>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
> >>>>>> mckenzie.cam@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Thanks Scott,
> >>>>>>>>> Those tests are now passing for me.
> >>>>>>>>>
> >>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently
on the
> >> 3.0
> >>>>>>>>> branch. It appears that this is actually potentially
a bug in the
> >>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
I've had a
> quick
> >>>>>> look
> >>>>>>>>> through, but I haven't dived in in any detail. It's
the
> >>>>>>>>> WatcherRemovalManager stuff I think. If you've got
time, can you
> >>>>>> have a
> >>>>>>>>> look? If not, let me know and I'll do some more
digging.
> >>>>>>>>>
> >>>>>>>>> cheers
> >>>>>>>>>
> >>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie
<
> >>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks Scott.
> >>>>>>>>>>
> >>>>>>>>>> Push the fix to master and merge it into 3.0.
> >>>>>>>>>>
> >>>>>>>>>> Then I guess, I'll push new versions of 2.11
and 3.2 onto Nexus.
> >>>>>>>>>> cheers
> >>>>>>>>>>
> >>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum
<
> >> dragonsinth@gmail.com
> >>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Alright, I have a fix, but it wants to be
applied to both
> master
> >>>>>> and
> >>>>>>>> 3.0.
> >>>>>>>>>>> Where should I push the fix?
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron
McKenzie <
> >>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>> If you just checkout the CURATOR-3.0
branch, they are failing
> >>>>>> there.
> >>>>>>>>>>>> cheers
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott
Blum <
> >>>>>> dragonsinth@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Sure, what SHA are they failing
at Cam?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM,
Jordan Zimmerman <
> >>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Scott can you take a look?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On May 25, 2016, at 4:35
AM, Cameron McKenzie <
> >>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Tree cache tests are still
failing. I've tried a few times
> >> but
> >>>>>> no
> >>>>>>>>>>>> love:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
> >>>>>>>>>>>>>> actual 6
> >>>>>>>>>>>>>>> expected -31:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I will have a look into
what's going on in the morning.
> Given
> >>>>>> that
> >>>>>>>>>>>>> these
> >>>>>>>>>>>>>>> may take some messing about
to fix up, do we just want to
> >> vote
> >>>>>> on
> >>>>>>>>>>>>> 2.11.0
> >>>>>>>>>>>>>>> separately, as that is all
ready to go?
> >>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, May 25, 2016 at
5:34 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>> jordan@jordanzimmerman.com>
wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Great news. Thanks.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> ====================
> >>>>>>>>>>>>>>>> Jordan Zimmerman
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On May 25, 2016,
at 12:37 AM, Cameron McKenzie <
> >>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I have fixed up
the test case, all good now.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, May 25,
2016 at 1:45 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Looks like it
was introduced with the schema validation
> >>>>>> stuff.
> >>>>>>>>>>> It
> >>>>>>>>>>>>> now
> >>>>>>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>>> an ACL check
prior to the backgrounding call. Because
> the
> >>>>>> unit
> >>>>>>>>>>>> test
> >>>>>>>>>>>>>>>> uses a
> >>>>>>>>>>>>>>>>>> bogus ACL provider
it just throws an exception
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>   final String
adjustedPath =
> >>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
> >>>>>>>>>>>>>>>> createMode.isSequential()));
> >>>>>>>>>>>>>>>>>>   List<ACL>
aclList = acling.getAclList(adjustedPath);
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>
> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
> >>>>>>>>>>>>>>>> data,
> >>>>>>>>>>>>>>>>>> aclList);
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>   String returnPath
= null;
> >>>>>>>>>>>>>>>>>>   if ( backgrounding.inBackground()
)
> >>>>>>>>>>>>>>>>>>   {
> >>>>>>>>>>>>>>>>>>       pathInBackground(adjustedPath,
data, givenPath);
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> So, I guess
the answer is to get the test to force a
> >> failure
> >>>>>>>>>>> in a
> >>>>>>>>>>>>>>>>>> different way.
With the UnhandledErrorListener, the
> >>>>>>>>>>> expectation is
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> this only gets
called on backgrounding operations?
> >>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Wed, May
25, 2016 at 1:39 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Same on
the master branch, but it passes there, so
> maybe
> >>>>>>>>>>>> something
> >>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>> legitimately
broken the test. Will let you know if I
> get
> >>>>>>>>>>> stuck.
> >>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed,
May 25, 2016 at 1:36 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com>
wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Let
me know if you need help.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> It might
be a bad merge. Have you compared it to the
> >>>>>> master
> >>>>>>>>>>>>> branch?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> -JZ
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
On May 24, 2016, at 10:31 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>
wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
Guys,
> >>>>>>>>>>>>>>>>>>>>>
There's a test
> >> TestFrameworkBackground:testErrorListener
> >>>>>>>>>>> that
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> failing
> >>>>>>>>>>>>>>>>>>>>>
consistently on the CURATOR-3.0 branch.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
I can't see how it has ever worked. It seems to try
> and
> >>>>>>>>>>> provoke
> >>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>> error
> >>>>>>>>>>>>>>>>>>>>>
via a bad ACL provider.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
But this ACL provider is called by the
> >> CreateBuilderImpl
> >>>>>>>>>>> prior
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>
backgrounding call, which means that the exception
> that
> >>>>>> it
> >>>>>>>>>>>> throws
> >>>>>>>>>>>>>>>>>>>> happens
> >>>>>>>>>>>>>>>>>>>>>
in the main Thread of the unit test. So, it just
> throws
> >>>>>> an
> >>>>>>>>>>>>>>>>>>>>>
UnsupportedOperationException which is propogated up
> >> the
> >>>>>>>>>>> stack
> >>>>>>>>>>>> at
> >>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>
point the unit test fails.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
So, I will look at fixing this, but I just don't
> >>>>>> understand
> >>>>>>>>>>> how
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>> ever
> >>>>>>>>>>>>>>>>>>>>>
worked?
> >>>>>>>>>>>>>>>>>>>>>
cheers
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message