curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cameron McKenzie <mckenzie....@gmail.com>
Subject Re: CURATOR-3.0 tests
Date Thu, 02 Jun 2016 05:04:59 GMT
Yeah, I'm still getting failures too. I will have more of a look if I get
time tonight.
cheers

On Thu, Jun 2, 2016 at 3:01 PM, Jordan Zimmerman <jordan@jordanzimmerman.com
> wrote:

> Hmm - I’m still getting failures - maybe I’m wrong. It’s late and I’m off
> to bed. I’ll look at this more tomorrow.
>
> -Jordan
>
> > On Jun 1, 2016, at 10:59 PM, Cameron McKenzie <mckenzie.cam@gmail.com>
> wrote:
> >
> > The counter is just being used to check if semaphores are still being
> > acquired. Essentially it just runs in a loop acquiring semaphores (and
> > incrementing the counter when they are acquired).
> >
> > Then it shuts down the server, waits until it the session is lost, then
> > restarts the server and then checks that semaphores are being acquired
> > correctly again (by checking that the counter is being incremented).
> >
> > This is just a simplified version of the test that is failing.
> >
> > When the test fails, all of the threads are attempting to get a lease on
> > the semaphore, but none of them get it, then the test times out while
> > waiting.
> >
> >
> >
> > On Thu, Jun 2, 2016 at 1:29 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com
> >> wrote:
> >
> >> I also had to add:
> >>
> >> while(!lost.get() && (counter.get() > 0))
> >> {
> >>    Thread.sleep(1000);
> >> }
> >> Which seems more correct to me.
> >>
> >>> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <mckenzie.cam@gmail.com>
> >> wrote:
> >>>
> >>> I have just pushed an interprocess_mutex_issue branch. The test case is
> >> in
> >>> TestInterprocessMutexNotReconnecting
> >>>
> >>> For me it's failing around 20% of the time.
> >>> cheers
> >>>
> >>> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <
> >> mckenzie.cam@gmail.com>
> >>> wrote:
> >>>
> >>>> Yep, just let me confirm that it's actually getting the same problem.
> >> I'm
> >>>> sure it was before, but I've just run it a bunch of times and
> >> everything's
> >>>> been fine.
> >>>>
> >>>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
> >>>> jordan@jordanzimmerman.com> wrote:
> >>>>
> >>>>> Can you push your unit test somewhere?
> >>>>>
> >>>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <
> mckenzie.cam@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2
> >> though.
> >>>>>> I've written a simplified unit test that just has a bunch of clients
> >>>>>> attempting to grab a lease on the semaphore. When I shutdown and
> >>>>> restart ZK
> >>>>>> about 25% of the time, none of the clients can reacquire the
> >> semaphore.
> >>>>>>
> >>>>>> Still trying to work out what's going on, but I'm probably not going
> >> to
> >>>>>> have a lot of time today to look at it.
> >>>>>> cheers
> >>>>>>
> >>>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
> >>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>
> >>>>>>> Odd - SemaphoreClient does seem wrong.
> >>>>>>>
> >>>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <
> >> mckenzie.cam@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> It looks like under some circumstances (which I haven't worked out
> >>>>> yet)
> >>>>>>>> that the InterprocessMutex acquire() is not working correctly when
> >>>>>>>> reconnecting to ZK. Still digging into why this is.
> >>>>>>>>
> >>>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
> >>>>> missing
> >>>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
> >>>>> throws
> >>>>>>> an
> >>>>>>>> exception if they return true. As far as I can work out, this
> means
> >>>>> that
> >>>>>>>> whenever the lock is acquired, an exception gets thrown indicating
> >>>>> that
> >>>>>>>> there are Multiple acquirers.
> >>>>>>>>
> >>>>>>>> This test is failing fairly consistently. It seems to be the
> >> remaining
> >>>>>>> test
> >>>>>>>> that keeps failing in the Jenkins build also
> >>>>>>>> cheers
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
> >>>>> mckenzie.cam@gmail.com
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Looks like I was incorrect. The NoWatcherException is being
> thrown
> >> on
> >>>>>>>>> success as well, and the problem is not in the cluster restart.
> >> Will
> >>>>>>> keep
> >>>>>>>>> digging.
> >>>>>>>>>
> >>>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
> >>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
> >>>>> (assertion
> >>>>>>> at
> >>>>>>>>>> line 294). Again, it seems like some sort of race condition with
> >> the
> >>>>>>>>>> watcher removal.
> >>>>>>>>>>
> >>>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When
> it
> >>>>> fails
> >>>>>>>>>> it seems that it's got something to do with watcher removal.
> When
> >>>>> the
> >>>>>>> test
> >>>>>>>>>> passes, this error is not logged.
> >>>>>>>>>>
> >>>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
> >>>>>>> KeeperErrorCode
> >>>>>>>>>> = No such watcher for /foo/bar/lock/leases
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
> >>>>>>>>>> at
> >> org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
> >>>>>>>>>> at
> org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> >>>>>>>>>> at
> >>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
> >>>>>>>>>>
> >>>>>>>>>> Is it possible it's something to do with the way that the
> cluster
> >> is
> >>>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new
> one
> >> is
> >>>>>>> just
> >>>>>>>>>> created.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
> >>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I’ll try to address this as part of CURATOR-333
> >>>>>>>>>>>
> >>>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
> >>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Maybe we need to look at some way of providing a hook for
> tests
> >> to
> >>>>>>> wait
> >>>>>>>>>>>> reliably for asynch tasks to finish?
> >>>>>>>>>>>>
> >>>>>>>>>>>> The latest round of tests ran OK. One test failed on an
> >> unrelated
> >>>>>>> thing
> >>>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as
> >> it's
> >>>>>>>>>>> worked
> >>>>>>>>>>>> ok the next time around.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I will start getting a release together. Thanks for you help
> >> with
> >>>>> the
> >>>>>>>>>>>> updated tests.
> >>>>>>>>>>>> cheers
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
> >>>>>>>>>>> jordan@jordanzimmerman.com
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> The problem is in-flight watchers and async background calls.
> >>>>>>> There’s
> >>>>>>>>>>> no
> >>>>>>>>>>>>> way to cancel these and they can take time to occur - even
> >> after
> >>>>> a
> >>>>>>>>>>> recipe
> >>>>>>>>>>>>> instance is closed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
> >>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Ok, running it again now.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is
> >> done
> >>>>>>>>>>>>>> asynchronously after they are closed?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
> >>>>>>>>>>>>> jordan@jordanzimmerman.com
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
> >>>>> checker.
> >>>>>>> If
> >>>>>>>>>>>>> there
> >>>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
> >>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
> >>>>> directly
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole
> thing
> >>>>> again
> >>>>>>>>>>> in
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> morning and see how it goes.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> There are still 2 tests failing for me:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> FAILURE! - in
> >>>>>>>>>>>>>>>>>
> >>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
> >>>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
> >>>>> still
> >>>>>>>>>>>>>>> registered:
> >>>>>>>>>>>>>>>>> [/test]
> >>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
> >>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> FAILURE! - in
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
> >>>>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
> >>>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found
> [false]
> >>>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
> >>>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
> >>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
> >>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
> >>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Failed tests:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
> >>>>> more
> >>>>>>>>>>> child
> >>>>>>>>>>>>>>>>> watchers are still registered: [/test]
> >>>>>>>>>>>>>>>>> Run 2: PASS
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
> >>>>> [true]
> >>>>>>>>>>> but
> >>>>>>>>>>>>>>>>> found [false]
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
> >>>>> against
> >>>>>>>>>>> that,
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
> >>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is
> >> merged
> >>>>>>>>>>> yet. I
> >>>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> -jordan
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to
> the
> >>>>> same
> >>>>>>>>>>> stuff
> >>>>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>> merging your fix:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Failed tests:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or
> more
> >>>>> child
> >>>>>>>>>>>>>>> watchers
> >>>>>>>>>>>>>>>>>>>> are still registered: [/test]
> >>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or
> more
> >>>>> child
> >>>>>>>>>>>>>>> watchers
> >>>>>>>>>>>>>>>>>>>> are still registered: [/test]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>>>>> Run 1:
> >>>>>>>>>>>>>>>
> >> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered:
> [/test]
> >>>>>>>>>>>>>>>>>>>> Run 2:
> >>>>>>>>>>>>>>>
> >> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered:
> [/test]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
> >>>>> more
> >>>>>>>>>>> child
> >>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
> >>>>> more
> >>>>>>>>>>> child
> >>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294
> >> expected
> >>>>>>>>>>> [true]
> >>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>> found [false]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>>>>>>>>>>>>>>> Run 1: PASS
> >>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256
> One
> >> or
> >>>>>>> more
> >>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
> >>>>>>>>>>> watchers are
> >>>>>>>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>>>> registered: [/count]
> >>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
> >>>>>>>>>>> watchers are
> >>>>>>>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>>>> registered: [/count]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so
> I’ll
> >>>>>>> spend
> >>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>> time on
> >>>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still
> >> supposed
> >>>>> to
> >>>>>>>>>>> get
> >>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
> >>>>>>> handling
> >>>>>>>>>>> it.
> >>>>>>>>>>>>>>> But,
> >>>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are
> >> some
> >>>>>>>>>>>>>>> significant
> >>>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to
> mirror
> >>>>> what
> >>>>>>>>>>>>>>>>>>> ZooKeeper does
> >>>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In
> hindsight,
> >>>>> the
> >>>>>>>>>>> whole
> >>>>>>>>>>>>> ZK
> >>>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
> >>>>> mutator
> >>>>>>>>>>> APIs.
> >>>>>>>>>>>>>>>>>>> But, of
> >>>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
> >>>>> consistently
> >>>>>>>>>>> on the
> >>>>>>>>>>>>>>> 3.0
> >>>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually
> potentially a
> >>>>> bug
> >>>>>>>>>>> in the
> >>>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
> >> I've
> >>>>>>> had a
> >>>>>>>>>>>>> quick
> >>>>>>>>>>>>>>>>>>> look
> >>>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's
> >> the
> >>>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got
> >> time,
> >>>>>>> can
> >>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>> have a
> >>>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more
> >> digging.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and
> 3.2
> >>>>> onto
> >>>>>>>>>>> Nexus.
> >>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
> >>>>>>>>>>>>>>> dragonsinth@gmail.com
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied
> to
> >>>>> both
> >>>>>>>>>>>>> master
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>> 3.0.
> >>>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie
> <
> >>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they
> >> are
> >>>>>>>>>>> failing
> >>>>>>>>>>>>>>>>>>> there.
> >>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
> >>>>>>>>>>>>>>>>>>> dragonsinth@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan
> Zimmerman
> >> <
> >>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie
> <
> >>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've
> tried a
> >>>>> few
> >>>>>>>>>>> times
> >>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>>>>>>>>>> love:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
> >>>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
> >>>>>>> morning.
> >>>>>>>>>>>>> Given
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we
> >> just
> >>>>>>> want
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>> vote
> >>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan
> >> Zimmerman
> >>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron
> >> McKenzie <
> >>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
> >>>>> McKenzie <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the
> schema
> >>>>>>>>>>> validation
> >>>>>>>>>>>>>>>>>>> stuff.
> >>>>>>>>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>>>>>>>> now
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding
> call.
> >>>>>>>>>>> Because
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> unit
> >>>>>>>>>>>>>>>>>>>>>>>>> test
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an
> >> exception
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> adjustPath(client.fixForNamespace(givenPath,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
> >>>>>>>>>>> acling.getAclList(adjustedPath);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>
> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  pathInBackground(adjustedPath, data,
> >>>>>>>>>>> givenPath);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test
> to
> >>>>>>> force a
> >>>>>>>>>>>>>>> failure
> >>>>>>>>>>>>>>>>>>>>>>>> in a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the
> >> UnhandledErrorListener,
> >>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> expectation is
> >>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
> >>>>> operations?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
> >>>>> McKenzie
> >>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
> >>>>> there,
> >>>>>>> so
> >>>>>>>>>>>>> maybe
> >>>>>>>>>>>>>>>>>>>>>>>>> something
> >>>>>>>>>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
> >>>>> know
> >>>>>>> if
> >>>>>>>>>>> I
> >>>>>>>>>>>>> get
> >>>>>>>>>>>>>>>>>>>>>>>> stuck.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
> >>>>>>> Zimmerman <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you
> compared
> >>>>> it to
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> master
> >>>>>>>>>>>>>>>>>>>>>>>>>> branch?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
> >>>>>>> McKenzie <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
> >>>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
> >>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
> >>>>> seems to
> >>>>>>>>>>> try
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> provoke
> >>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
> >>>>>>>>>>>>>>> CreateBuilderImpl
> >>>>>>>>>>>>>>>>>>>>>>>> prior
> >>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
> >>>>>>>>>>> exception
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>> throws
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So,
> >> it
> >>>>>>> just
> >>>>>>>>>>>>> throws
> >>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
> >>>>>>>>>>> propogated up
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> stack
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I
> just
> >>>>>>> don't
> >>>>>>>>>>>>>>>>>>> understand
> >>>>>>>>>>>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message