geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Deppe <jde...@pivotal.io>
Subject Re: Debugging intermittent dunit failures
Date Fri, 15 Dec 2017 00:08:08 GMT
I'll do that.

On Thu, Dec 14, 2017 at 2:02 PM, Kirk Lund <klund@apache.org> wrote:

> Someone needs to try it out. If there are any build-engineer types working
> on geode, then my suggestion is for them to try this change and report the
> timing difference.
>
> On Tue, Dec 12, 2017 at 2:22 PM, Alexander Murmann <amurmann@pivotal.io>
> wrote:
>
> > Do we have a rough idea how forking every time would impact how long
> tests
> > run?
> >
> > On Tue, Dec 12, 2017 at 1:39 PM, Kirk Lund <klund@apache.org> wrote:
> >
> > > We should just change to fork every 1 instead of 30. Wasting time
> trying
> > to
> > > debug statics is well... it's a waste of time. We should be focused on
> > > other things.
> > >
> > > On Mon, Dec 11, 2017 at 9:05 PM, Jinmei Liao <jiliao@pivotal.io>
> wrote:
> > >
> > > > It doesn't call as much static methods as JUnit4DistributedTestCase.
> > > > tearDownVM,
> > > > see MemberStarterRule.after().
> > > >
> > > > On Mon, Dec 11, 2017 at 4:36 PM, Dan Smith <dsmith@pivotal.io>
> wrote:
> > > >
> > > > > I don't think we are trying to reuse the distributed system  - it
> > gets
> > > > > disconnected after each test. See JUnit4DistributedTestCase.
> > > tearDownVM.
> > > > >
> > > > > Are the new junit rules also cleaning things up?
> > > > >
> > > > > -Dan
> > > > >
> > > > > On Mon, Dec 11, 2017 at 4:16 PM, Kirk Lund <klund@apache.org>
> wrote:
> > > > >
> > > > > > Is there a reason we can't change DistributedTestCase and
> > subclasses
> > > to
> > > > > use
> > > > > > TemporaryFolder for all artifacts?
> > > > > >
> > > > > > We could also disconnectAllFromDS in @AfterClass (or even @After)
> > to
> > > > get
> > > > > > things a bit more separate between dunit test classes.
> > > > > >
> > > > > > Running dunit tests in parallel is much more important than
> trying
> > to
> > > > > reuse
> > > > > > distributed system across multiple dunit tests. The latter just
> > isn't
> > > > > worth
> > > > > > the headache and trouble that it causes when static vars or
> > constants
> > > > or
> > > > > > disk artifacts pollute later tests.
> > > > > >
> > > > > > On Mon, Dec 11, 2017 at 1:42 PM, Dan Smith <dsmith@pivotal.io>
> > > wrote:
> > > > > >
> > > > > > > One other thing you can do is look for the below line in
the
> logs
> > > of
> > > > > your
> > > > > > > failure. These are the tests that ran in the same JVM as
your
> > > tests.
> > > > > This
> > > > > > > won't help if your tests are getting messed up by disk
> artifacts
> > or
> > > > > port
> > > > > > > issues, but if it is some JVM state left by a previous
test it
> > > would
> > > > be
> > > > > > in
> > > > > > > this list.
> > > > > > >
> > > > > > > Previously run tests: [ClientServerMiscSelectorDUnitTest,
> > > > > > > ClientConflationDUnitTest, ReliableMessagingDUnitTest]
> > > > > > >
> > > > > > > On Mon, Dec 11, 2017 at 1:14 PM, Jens Deppe <
> > jensdeppe@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > I've recently debugged various distributed tests which
fail
> as
> > a
> > > > > result
> > > > > > > of
> > > > > > > > prior tests not cleaning up enough. It's somewhat
painful and
> > > this
> > > > is
> > > > > > my
> > > > > > > > usual debug process:
> > > > > > > >
> > > > > > > >
> > > > > > > >    - Examine the progress.txt file to determine which
tests
> ran
> > > > > before
> > > > > > my
> > > > > > > >    failing test.
> > > > > > > >    - I pick 20-25 of these tests and create a Suite
> (including
> > my
> > > > > > failing
> > > > > > > >    test) - as these tests may have run in parallel,
it's not
> > > clear
> > > > > > which
> > > > > > > > tests
> > > > > > > >    would have run immediately prior to your test
> > > > > > > >    - Run the whole suite to see if I can get my test
to fail
> > > > > > > >    - Bisect or manually iterate through the tests
to see
> which
> > > one
> > > > is
> > > > > > > >    causing the problem.
> > > > > > > >
> > > > > > > >
> > > > > > > > The last step is painful, so I've updated SuiteRunner
to use
> a
> > > > > > > 'candidate'
> > > > > > > > test class and run this class after every other class
in the
> > list
> > > > of
> > > > > > > > SuiteClasses. For example:
> > > > > > > >
> > > > > > > > @Suite.SuiteClasses(value = {
> > > > > > > >     org.apache.geode.cache.snapshot.
> > SnapshotByteArrayDUnitTest.
> > > > > class,
> > > > > > > >     org.apache.geode.cache.query.dunit.
> > > > > QueryDataInconsistencyDUnitTes
> > > > > > > > t.class,
> > > > > > > >     org.apache.geode.cache.query.internal.index.
> > > > > > > > MultiIndexCreationDUnitTest.class,
> > > > > > > > })
> > > > > > > >  @SuiteRunner.Candidate(org.apache.geode.management.
> > > > > > > > internal.configuration.ClusterConfigDistributionDUnit
> > Test.class)
> > > > > > > > @RunWith(SuiteRunner.class)
> > > > > > > > public class DebugSuite {
> > > > > > > > }
> > > > > > > >
> > > > > > > >
> > > > > > > > The Candidate is optional, but this would run the
following
> > > tests:
> > > > > > > >
> > > > > > > > - SnapshotByteArrayDUnitTest
> > > > > > > > - *ClusterConfigDistributionDUnitTest*
> > > > > > > > - QueryDataInconsistencyDUnitTest
> > > > > > > > - *ClusterConfigDistributionDUnitTest*
> > > > > > > > - MultiIndexCreationDUnitTest
> > > > > > > > - *ClusterConfigDistributionDUnitTest*
> > > > > > > >
> > > > > > > > --Jens
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Cheers
> > > >
> > > > Jinmei
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message