hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: [DISCUSS] options for precommit test reliability?
Date Wed, 11 Oct 2017 17:19:47 GMT
Thats a lovely report Busbey.

Let me see if I can get a rough answer to your question on minicluster
cores.

S


On Wed, Oct 11, 2017 at 6:43 AM, Sean Busbey <busbey@apache.org> wrote:

> Currently our precommit build has a history of ~233 builds.
>
> Looking across[1] those for those with unit test logs, and treating
> the string "timeout" as an indicator that things failed because of
> timeout rather than a known bad answer, we have 80 builds that had one
> or more test timeout.
>
> breaking this down by host:
>
> | Host | % timeout | Success | Timeout Failure | General Failure |
> | ---- | ---------:| -------:| ---------------:| ---------------:|
> | H0   | 42%       | 10      | 15              | 11              |
> | H1   | 54%       | 6       | 14              | 6               |
> | H2   | 45%       | 18      | 35              | 24              |
> | H3   | 100%      | 0       | 1               | 0               |
> | H4   | 0%        | 1       | 0               | 2               |
> | H5   | 20%       | 1       | 1               | 3               |
> | H6   | 44%       | 4       | 4               | 1               |
> | H9   | 35%       | 2       | 7               | 11              |
> | H10  | 26%       | 4       | 8               | 19              |
> | H11  | 0%        | 0       | 0               | 2               |
> | H12  | 43%       | 1       | 3               | 3               |
> | H13  | 22%       | 1       | 2               | 6               |
> | H26  | 0%        | 0       | 0               | 1               |
>
>
> It's odd that we so strongly favor H2. But I don't see evidence that
> we have a bad host that we could just exclude.
>
> Scaling our concurrency by number of CPU cores is something surefire
> can do. Let me see what the H* hosts look like to figure out some
> example mappings. Do we have a rough bound on how many cores a single
> test using MiniCluster should need? 3?
>
> -busbey
>
> [1]: By "looking across" I mean using the python-jenkins library
>
> https://gist.github.com/busbey/ff5f7ae3a292164cc110fdb934935c8c
>
>
>
> On Mon, Oct 9, 2017 at 4:40 PM, Stack <stack@duboce.net> wrote:
> > On Mon, Oct 9, 2017 at 7:38 AM, Sean Busbey <busbey@apache.org> wrote:
> >
> >> Hi folks!
> >>
> >> Lately our precommit runs have had a large amount of noise around unit
> >> test failures due to timeout, especially for the hbase-server module.
> >>
> >>
> > I've not looked at why the timeouts. Anyone? Usually there is a cause.
> >
> > ...
> >
> >
> >> I'd really like to get us back to a place where a precommit -1 doesn't
> >> just result in a reflexive "precommit is unreliable."
> >
> >
> > This is the default. The exception is one of us works on stabilizing test
> > suite. It takes a while and a bunch of effort but stabilization has been
> > doable in the past. Once stable, it stays that way a while before the rot
> > sets in.
> >
> >
> >
> >> * Do fewer parallel executions. We do 5 tests at once now and the
> >> hbase-server module takes ~1.5 hours. We could tune down just the
> >> hbase-server module to do fewer.
> >>
> >
> >
> > Is it the loading that is the issue or tests stamping on each other. If
> > latter, I'd think we'd want to fix it. If former, would want to look at
> it
> > too; I'd think our tests shouldn't be such that they fall over if the
> > context is other than 'perfect'.
> >
> > I've not looked at a machine when five concurrent hbase tests running. Is
> > it even putting up a load? Over the extent of the full test suite? Or is
> it
> > that it is just a few tests that when run together, they cause issue.
> Could
> > we stagger these or give them their own category or have them burn less
> > brightly?
> >
> > If tests are failing because contention for resources, we should fix the
> > test. If given a machine, we should burn it up rather than pussy-foot it
> > I'd say (can we size the concurrency off a query of the underlying OS so
> we
> > step by CPUs say?).
> >
> > Tests could do with an edit. Generally, tests are written once and then
> > never touched again. Meantime the system evolves. Edit could look for
> > redundancy. Edit could look for cases where we start clusters
> > --timeconsumming--  and we don't have to (use Mocks or start standalone
> > instances instead). We also have some crazy tests that spin up lots of
> > clusters all inside a single JVM though the context is the same as that
> of
> > a simple method evaluation.
> >
> > St.Ack
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message