hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: [DISCUSS] options for precommit test reliability?
Date Mon, 09 Oct 2017 18:23:45 GMT
I find these options useful when running on contended or underpowered test
hosts

    -Dsurefire.firstPartForkCount=1 \
    -Dsurefire.secondPartForkCount=1 \
    -Dsurefire.rerunFailingTestsCount=3

It balloons the test suite execution time, but produces more stable
results, and the rerun setting allows Surefire to help detect flaky tests.



On Mon, Oct 9, 2017 at 7:48 AM, Mike Drob <mdrob@apache.org> wrote:

> Addressing your individual suggestions inline.
>
> Another one that you missed (more long term) is splitting up the server
> module into smaller modules. We have some work on this already (backup,
> mapreduce) but it's a long way to go...
>
>
> On Mon, Oct 9, 2017 at 9:38 AM, Sean Busbey <busbey@apache.org> wrote:
>
> > Hi folks!
> >
> > Lately our precommit runs have had a large amount of noise around unit
> > test failures due to timeout, especially for the hbase-server module.
> >
> > I'd really like to get us back to a place where a precommit -1 doesn't
> > just result in a reflexive "precommit is unreliable."
> >
> > When the hbase-server module is going to be run (which would include
> > changes to that module and changes to the top-level of the project), I
> > can think of a few ways to bring the noise down:
> >
> > * Do fewer parallel executions. We do 5 tests at once now and the
> > hbase-server module takes ~1.5 hours. We could tune down just the
> > hbase-server module to do fewer.
> >
>
> 1.5 hours is already past the threshold where I have to go do something
> else while I wait for the tests to finish. Putting this up to 3 hours
> wouldn't affect my productivity, I don't think.
>
>
> > * Do more test re-runs. We could have tests that fail retry more. I
> > think maybe we allow a single retry currently via surefire. We'd have
> > to do it outside of surefire to account for the large number of
> > time-out failures.
> >
>
> I like the idea of more retries, but I don't like going outside of
> surefire. I don't want us maintaining more custom hacks and shims in place
> for something that should be temporary - once we get the tests stabilized
> we shouldn't need it, right?
>
>
> > * Don't run the hbase-server module tests (or just run those tests
> > that expressly changed in the patch). Instead, we'd include a warning
> > to the committer that they need to test this particular module
> > independently. We could also add a committer-initiated jenkins job
> > that runs the tests for just hbase-server.
> >
>
> I'm optimistic about human nature, but I think this means that the tests
> just wouldn't get run.
>
> >
> > What do folks think?
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message