zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Han <h...@apache.org>
Subject Re: Trying to find pattern in Flaky Tests
Date Thu, 19 Jul 2018 04:09:46 GMT
Thanks Pat for promptly fixing this!

I have no idea of the "failed to get" symptoms. Probably we could give it
more days and see if the pattern recurs? If not might be a transient infra
issue...

On Wed, Jul 18, 2018 at 11:16 AM, Patrick Hunt <phunt@apache.org> wrote:

> Ok, I committed a change that seems to address the main failure:
> https://github.com/apache/zookeeper/commit/06b9507ab78a1a055b8f467846c157
> 91600b72ee
>
> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html
>
> However I do notice some oddness in the sense that for some jobs/runs it
> fails to get the information from the REST interface, even though it's fine
> for most of them, take a look, any ideas?
> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> ZooKeeper-Find-Flaky-Tests/456/console
>
> [ZooKeeper-Find-Flaky-Tests] $ /bin/bash /tmp/
> jenkins4452773653790031730.sh
> ERROR:__main__:failed to get:
> https://builds.apache.org/job/ZooKeeper-trunk/108/
> testReport/api/json?tree=suites%5Bname%2Ccases%
> 5BclassName%2Cname%2Cstatus%5D%5D
> ERROR:__main__:failed to get:
> https://builds.apache.org/job/ZooKeeper-trunk/104/
> testReport/api/json?tree=suites%5Bname%2Ccases%
> 5BclassName%2Cname%2Cstatus%5D%5D
> ERROR:__main__:failed to get:
> https://builds.apache.org/job/ZooKeeper-trunk/100/
> testReport/api/json?tree=suites%5Bname%2Ccases%
> 5BclassName%2Cname%2Cstatus%5D%5D
>
>
> Notice that it doesn't complain about job 107 (etc...)
>
> Any ideas on this? Have you seen this before? Perhaps we should open an
> INFRA jira?
>
> Patrick
>
> On Wed, Jul 18, 2018 at 10:52 AM Patrick Hunt <phunt@apache.org> wrote:
>
> > FYI, created this:
> > https://issues.apache.org/jira/browse/INFRA-16785
> > for the security warnings, not sure if that's causing the issue. Likely
> > it's the recent jenkins upgrade, looking into it a bit...
> >
> > Patrick
> >
> >
> > On Wed, Jul 18, 2018 at 9:48 AM Michael Han <hanm@apache.org> wrote:
> >
> >> Hi Andor,
> >>
> >> >> I suspect it should succeed eventually if we were to increase the
> >> timeout even more. But is that correct? Bug or infrastructure issue?
> >>
> >> You could set up a dedicated git branch with all patches (e.g. the one
> in
> >> ZOOKEEPER-2251) you want to apply and I can set up a dedicated Jenkins
> job
> >> that points to this branch and stress test the entire unit test suite.
> >> Some
> >> tests are only flaky when they ran on Apache infrastructure and when
> they
> >> ran together.
> >>
> >> It would be interesting to figure out what cause this test fail. Since
> >> same
> >> test works reliably in 3.4, there must be some commits in 3.5 that we
> >> could
> >> possibly blame...
> >>
> >> >> I'm going to raise a ticket on that if somebody willing to fix it.
> >>
> >> I just had a brief look before Jenkins is down. Looks like python was
> >> complaining about some SSL stuff and I suspect if we upgrade to use
> later
> >> version of python (3.x) it might work. I'll try that later when Jenkins
> is
> >> back.
> >>
> >>
> >> On Wed, Jul 18, 2018 at 8:42 AM, Andor Molnar
> <andor@cloudera.com.invalid
> >> >
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > *branch-3.4*
> >> >
> >> > I've taken a quick look at our Jenkins builds and in terms of flaky
> >> tests,
> >> > it looks like branch-3.4 is in a pretty good shape. The build hasn't
> >> failed
> >> > for 5-6 days on all JDKs which I think is pretty awesome.
> >> >
> >> > *branch-3.5*
> >> >
> >> > This branch is in very bad condition. Which is quite unfortunate given
> >> > we're in the middle of stabilising it. :)
> >> > Especially on JDK8, last successful build was 11 days ago. JDK9 (50%
> >> > failing) and JDK10 (30% failing) are looking better in the last 10
> >> builds.
> >> >
> >> > Interestingly (apart from a few quite rare ones) it looks there's
> only 1
> >> > test which is quite nasty on this branch:
> testManyChildWatchersAutoReset
> >> >
> >> > There's a Jira about fixing it and a fix has been merged by increasing
> >> the
> >> > timeout of the test, but having a bug on the branch is also possible
> >> > causing the test to fail even with 10 min timeout.
> >> >
> >> > I wasn't able to repro the failing test on my machine (Mac and
> >> CentOS7), it
> >> > always finished in 30-40 seconds maximum. On jenkins slaves it shows
> the
> >> > following:
> >> >
> >> > *JDK 8:*
> >> >
> >> > Report creation timed out.
> >> >
> >> >
> >> > *JDK 9:*
> >> >
> >> > New Failures
> >> > Chart
> >> > See children
> >> > Build Number ⇒
> >> > Package-Class-Testmethod names ⇓
> >> > 351
> >> > 350
> >> > 349
> >> > 348
> >> > 347
> >> > 346
> >> > 345
> >> > 344
> >> > 343
> >> > 342
> >> > 341
> >> > 340
> >> > 339
> >> > 338
> >> > 337
> >> > 336
> >> > 335
> >> > 334
> >> >  testManyChildWatchersAutoReset
> >> > 45.604
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/351/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.337
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/350/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 21.904
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/349/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 583.063
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/348/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.325
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/347/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.383
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/346/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.362
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/345/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 21.139
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/344/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 24.031
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/343/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 584.200
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/342/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.327
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/341/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.323
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/340/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 23.737
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/339/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.406
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/338/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 547.004
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/337/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.393
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/336/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > N/A
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/test_results_analyzer/>
> >> > 373.955
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java9/334/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> >
> >> >
> >> > *JDK 10:*
> >> >
> >> >
> >> > New Failures
> >> > Chart
> >> > See children
> >> > Build Number ⇒
> >> > Package-Class-Testmethod names ⇓
> >> > 110
> >> > 109
> >> > 108
> >> > 107
> >> > 106
> >> > 105
> >> > 104
> >> > 103
> >> > 102
> >> > 101
> >> > 100
> >> > 99
> >> > 98
> >> > 97
> >> > 96
> >> > 95
> >> > 94
> >> > 93
> >> > 92
> >> >  testManyChildWatchersAutoReset
> >> > 364.945
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/110/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 543.983
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/109/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 388.182
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/108/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.446
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/107/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.025
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/106/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 535.046
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/105/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.306
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/104/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 474.005
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/103/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 560.925
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/102/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.328
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/101/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 558.547
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/100/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.397
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/99/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.414
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/98/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 430.383
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/97/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 564.064
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/96/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 600.357
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/95/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 432.435
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/94/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 596.378
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/93/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> > 39.242
> >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> > ZooKeeper_branch35_java10/92/testReport/org.apache.zookeeper.test/
> >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> >> >
> >> >
> >> > It takes ages to complete on Jenkins for some reason and it looks like
> >> it
> >> > ends quite frequently close to the limit, so I suspect it should
> succeed
> >> > eventually if we were to increase the timeout even more. But is that
> >> > correct?
> >> > Bug or infrastructure issue?
> >> >
> >> > *master / 3.6*
> >> >
> >> > Pretty much the same as 3.5. I haven't seen
> >> testManyChildWatchersAutoReset
> >> > failing on this branch with JDK8 which is a bit confusing, but other
> >> then
> >> > that I see the same pattern on JDK9 and JDK10. Unable to generate the
> >> above
> >> > reports here, because Test Result Analyzer keep timeouting for me, but
> >> I'll
> >> > follow-up when I have them.
> >> >
> >> > Btw. Flaky Test report has been broken for 10 days, I'm going to
> raise a
> >> > ticket on that if somebody willing to fix it. (I'm planning to do so.)
> >> > It would be nice to see the report working again, because if my
> >> > observations are correct, we don't have too many annoying tests apart
> >> from
> >> > the one mentioned.
> >> >
> >> > Thanks,
> >> > Andor
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message