zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Han <h...@apache.org>
Subject Re: Trying to find pattern in Flaky Tests
Date Wed, 18 Jul 2018 16:48:18 GMT
Hi Andor,

>> I suspect it should succeed eventually if we were to increase the
timeout even more. But is that correct? Bug or infrastructure issue?

You could set up a dedicated git branch with all patches (e.g. the one in
ZOOKEEPER-2251) you want to apply and I can set up a dedicated Jenkins job
that points to this branch and stress test the entire unit test suite. Some
tests are only flaky when they ran on Apache infrastructure and when they
ran together.

It would be interesting to figure out what cause this test fail. Since same
test works reliably in 3.4, there must be some commits in 3.5 that we could
possibly blame...

>> I'm going to raise a ticket on that if somebody willing to fix it.

I just had a brief look before Jenkins is down. Looks like python was
complaining about some SSL stuff and I suspect if we upgrade to use later
version of python (3.x) it might work. I'll try that later when Jenkins is
back.


On Wed, Jul 18, 2018 at 8:42 AM, Andor Molnar <andor@cloudera.com.invalid>
wrote:

> Hi,
>
> *branch-3.4*
>
> I've taken a quick look at our Jenkins builds and in terms of flaky tests,
> it looks like branch-3.4 is in a pretty good shape. The build hasn't failed
> for 5-6 days on all JDKs which I think is pretty awesome.
>
> *branch-3.5*
>
> This branch is in very bad condition. Which is quite unfortunate given
> we're in the middle of stabilising it. :)
> Especially on JDK8, last successful build was 11 days ago. JDK9 (50%
> failing) and JDK10 (30% failing) are looking better in the last 10 builds.
>
> Interestingly (apart from a few quite rare ones) it looks there's only 1
> test which is quite nasty on this branch: testManyChildWatchersAutoReset
>
> There's a Jira about fixing it and a fix has been merged by increasing the
> timeout of the test, but having a bug on the branch is also possible
> causing the test to fail even with 10 min timeout.
>
> I wasn't able to repro the failing test on my machine (Mac and CentOS7), it
> always finished in 30-40 seconds maximum. On jenkins slaves it shows the
> following:
>
> *JDK 8:*
>
> Report creation timed out.
>
>
> *JDK 9:*
>
> New Failures
> Chart
> See children
> Build Number ⇒
> Package-Class-Testmethod names ⇓
> 351
> 350
> 349
> 348
> 347
> 346
> 345
> 344
> 343
> 342
> 341
> 340
> 339
> 338
> 337
> 336
> 335
> 334
>  testManyChildWatchersAutoReset
> 45.604
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/351/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.337
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/350/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 21.904
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/349/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 583.063
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/348/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.325
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/347/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.383
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/346/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.362
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/345/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 21.139
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/344/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 24.031
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/343/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 584.200
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/342/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.327
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/341/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.323
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/340/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 23.737
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/339/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.406
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/338/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 547.004
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/337/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.393
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/336/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> N/A
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/test_results_analyzer/>
> 373.955
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java9/334/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>
>
> *JDK 10:*
>
>
> New Failures
> Chart
> See children
> Build Number ⇒
> Package-Class-Testmethod names ⇓
> 110
> 109
> 108
> 107
> 106
> 105
> 104
> 103
> 102
> 101
> 100
> 99
> 98
> 97
> 96
> 95
> 94
> 93
> 92
>  testManyChildWatchersAutoReset
> 364.945
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/110/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 543.983
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/109/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 388.182
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/108/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.446
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/107/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.025
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/106/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 535.046
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/105/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.306
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/104/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 474.005
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/103/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 560.925
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/102/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.328
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/101/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 558.547
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/100/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.397
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/99/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.414
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/98/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 430.383
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/97/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 564.064
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/96/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 600.357
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/95/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 432.435
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/94/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 596.378
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/93/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> 39.242
> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper_branch35_java10/92/testReport/org.apache.zookeeper.test/
> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>
>
> It takes ages to complete on Jenkins for some reason and it looks like it
> ends quite frequently close to the limit, so I suspect it should succeed
> eventually if we were to increase the timeout even more. But is that
> correct?
> Bug or infrastructure issue?
>
> *master / 3.6*
>
> Pretty much the same as 3.5. I haven't seen testManyChildWatchersAutoReset
> failing on this branch with JDK8 which is a bit confusing, but other then
> that I see the same pattern on JDK9 and JDK10. Unable to generate the above
> reports here, because Test Result Analyzer keep timeouting for me, but I'll
> follow-up when I have them.
>
> Btw. Flaky Test report has been broken for 10 days, I'm going to raise a
> ticket on that if somebody willing to fix it. (I'm planning to do so.)
> It would be nice to see the report working again, because if my
> observations are correct, we don't have too many annoying tests apart from
> the one mentioned.
>
> Thanks,
> Andor
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message