curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: CURATOR-3.0 tests
Date Thu, 02 Jun 2016 05:01:40 GMT
Hmm - I’m still getting failures - maybe I’m wrong. It’s late and I’m off to bed. I’ll look at this more tomorrow.

-Jordan

> On Jun 1, 2016, at 10:59 PM, Cameron McKenzie <mckenzie.cam@gmail.com> wrote:
> 
> The counter is just being used to check if semaphores are still being
> acquired. Essentially it just runs in a loop acquiring semaphores (and
> incrementing the counter when they are acquired).
> 
> Then it shuts down the server, waits until it the session is lost, then
> restarts the server and then checks that semaphores are being acquired
> correctly again (by checking that the counter is being incremented).
> 
> This is just a simplified version of the test that is failing.
> 
> When the test fails, all of the threads are attempting to get a lease on
> the semaphore, but none of them get it, then the test times out while
> waiting.
> 
> 
> 
> On Thu, Jun 2, 2016 at 1:29 PM, Jordan Zimmerman <jordan@jordanzimmerman.com
>> wrote:
> 
>> I also had to add:
>> 
>> while(!lost.get() && (counter.get() > 0))
>> {
>>    Thread.sleep(1000);
>> }
>> Which seems more correct to me.
>> 
>>> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <mckenzie.cam@gmail.com>
>> wrote:
>>> 
>>> I have just pushed an interprocess_mutex_issue branch. The test case is
>> in
>>> TestInterprocessMutexNotReconnecting
>>> 
>>> For me it's failing around 20% of the time.
>>> cheers
>>> 
>>> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>>> wrote:
>>> 
>>>> Yep, just let me confirm that it's actually getting the same problem.
>> I'm
>>>> sure it was before, but I've just run it a bunch of times and
>> everything's
>>>> been fine.
>>>> 
>>>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
>>>> jordan@jordanzimmerman.com> wrote:
>>>> 
>>>>> Can you push your unit test somewhere?
>>>>> 
>>>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <mckenzie.cam@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2
>> though.
>>>>>> I've written a simplified unit test that just has a bunch of clients
>>>>>> attempting to grab a lease on the semaphore. When I shutdown and
>>>>> restart ZK
>>>>>> about 25% of the time, none of the clients can reacquire the
>> semaphore.
>>>>>> 
>>>>>> Still trying to work out what's going on, but I'm probably not going
>> to
>>>>>> have a lot of time today to look at it.
>>>>>> cheers
>>>>>> 
>>>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>> 
>>>>>>> Odd - SemaphoreClient does seem wrong.
>>>>>>> 
>>>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> It looks like under some circumstances (which I haven't worked out
>>>>> yet)
>>>>>>>> that the InterprocessMutex acquire() is not working correctly when
>>>>>>>> reconnecting to ZK. Still digging into why this is.
>>>>>>>> 
>>>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
>>>>> missing
>>>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
>>>>> throws
>>>>>>> an
>>>>>>>> exception if they return true. As far as I can work out, this means
>>>>> that
>>>>>>>> whenever the lock is acquired, an exception gets thrown indicating
>>>>> that
>>>>>>>> there are Multiple acquirers.
>>>>>>>> 
>>>>>>>> This test is failing fairly consistently. It seems to be the
>> remaining
>>>>>>> test
>>>>>>>> that keeps failing in the Jenkins build also
>>>>>>>> cheers
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
>>>>> mckenzie.cam@gmail.com
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Looks like I was incorrect. The NoWatcherException is being thrown
>> on
>>>>>>>>> success as well, and the problem is not in the cluster restart.
>> Will
>>>>>>> keep
>>>>>>>>> digging.
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
>>>>> (assertion
>>>>>>> at
>>>>>>>>>> line 294). Again, it seems like some sort of race condition with
>> the
>>>>>>>>>> watcher removal.
>>>>>>>>>> 
>>>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it
>>>>> fails
>>>>>>>>>> it seems that it's got something to do with watcher removal. When
>>>>> the
>>>>>>> test
>>>>>>>>>> passes, this error is not logged.
>>>>>>>>>> 
>>>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>>>>>>> KeeperErrorCode
>>>>>>>>>> = No such watcher for /foo/bar/lock/leases
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>>>>>>>>> at
>> org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>>>>>>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>>>> at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>>>>>>> 
>>>>>>>>>> Is it possible it's something to do with the way that the cluster
>> is
>>>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new one
>> is
>>>>>>> just
>>>>>>>>>> created.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I’ll try to address this as part of CURATOR-333
>>>>>>>>>>> 
>>>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Maybe we need to look at some way of providing a hook for tests
>> to
>>>>>>> wait
>>>>>>>>>>>> reliably for asynch tasks to finish?
>>>>>>>>>>>> 
>>>>>>>>>>>> The latest round of tests ran OK. One test failed on an
>> unrelated
>>>>>>> thing
>>>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as
>> it's
>>>>>>>>>>> worked
>>>>>>>>>>>> ok the next time around.
>>>>>>>>>>>> 
>>>>>>>>>>>> I will start getting a release together. Thanks for you help
>> with
>>>>> the
>>>>>>>>>>>> updated tests.
>>>>>>>>>>>> cheers
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> The problem is in-flight watchers and async background calls.
>>>>>>> There’s
>>>>>>>>>>> no
>>>>>>>>>>>>> way to cancel these and they can take time to occur - even
>> after
>>>>> a
>>>>>>>>>>> recipe
>>>>>>>>>>>>> instance is closed.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Ok, running it again now.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is
>> done
>>>>>>>>>>>>>> asynchronously after they are closed?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
>>>>> checker.
>>>>>>> If
>>>>>>>>>>>>> there
>>>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
>>>>> directly
>>>>>>>>>>> in
>>>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing
>>>>> again
>>>>>>>>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> morning and see how it goes.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>> 
>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
>>>>> still
>>>>>>>>>>>>>>> registered:
>>>>>>>>>>>>>>>>> [/test]
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
>>>>> more
>>>>>>>>>>> child
>>>>>>>>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>>>>>>>>> Run 2: PASS
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>>> [true]
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
>>>>> against
>>>>>>>>>>> that,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is
>> merged
>>>>>>>>>>> yet. I
>>>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -jordan
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the
>>>>> same
>>>>>>>>>>> stuff
>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
>>>>> child
>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
>>>>> child
>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1:
>>>>>>>>>>>>>>> 
>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> Run 2:
>>>>>>>>>>>>>>> 
>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
>>>>> more
>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
>>>>> more
>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294
>> expected
>>>>>>>>>>> [true]
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One
>> or
>>>>>>> more
>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
>>>>>>> spend
>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> time on
>>>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still
>> supposed
>>>>> to
>>>>>>>>>>> get
>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>>>>>>> handling
>>>>>>>>>>> it.
>>>>>>>>>>>>>>> But,
>>>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are
>> some
>>>>>>>>>>>>>>> significant
>>>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror
>>>>> what
>>>>>>>>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight,
>>>>> the
>>>>>>>>>>> whole
>>>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
>>>>> mutator
>>>>>>>>>>> APIs.
>>>>>>>>>>>>>>>>>>> But, of
>>>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
>>>>> consistently
>>>>>>>>>>> on the
>>>>>>>>>>>>>>> 3.0
>>>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a
>>>>> bug
>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
>> I've
>>>>>>> had a
>>>>>>>>>>>>> quick
>>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's
>> the
>>>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got
>> time,
>>>>>>> can
>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more
>> digging.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2
>>>>> onto
>>>>>>>>>>> Nexus.
>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>>>>>>>>> dragonsinth@gmail.com
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to
>>>>> both
>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they
>> are
>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman
>> <
>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a
>>>>> few
>>>>>>>>>>> times
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>>>>>>> morning.
>>>>>>>>>>>>> Given
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we
>> just
>>>>>>> want
>>>>>>>>>>> to
>>>>>>>>>>>>>>> vote
>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan
>> Zimmerman
>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron
>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>>>>>>>>>>> validation
>>>>>>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>>>>>>>>>>> Because
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an
>> exception
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>>>>>>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  pathInBackground(adjustedPath, data,
>>>>>>>>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
>>>>>>> force a
>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the
>> UnhandledErrorListener,
>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
>>>>> operations?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
>>>>> McKenzie
>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
>>>>> there,
>>>>>>> so
>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
>>>>> know
>>>>>>> if
>>>>>>>>>>> I
>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>>>>>>> Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared
>>>>> it to
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>>>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
>>>>> seems to
>>>>>>>>>>> try
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>>>>>>>>> exception
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So,
>> it
>>>>>>> just
>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>>>>>>>>> propogated up
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
>>>>>>> don't
>>>>>>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 


Mime
View raw message