zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enrico Olivelli <eolive...@gmail.com>
Subject Re: Decrease number of threads in Jenkins builds to reduce flakyness
Date Mon, 15 Oct 2018 11:55:08 GMT
Il giorno lun 15 ott 2018 alle ore 12:46 Andor Molnar
<andor@apache.org> ha scritto:
>
> Thank you guys. This is great help.
>
> I remember your efforts Bogdan, as far as I remember you observer thread starvation in
multiple runs on Apache Jenkins. Correct my if I’m wrong.
>
> I’ve created an umbrella Jira to capture all flaky test fixing efforts here:
> https://issues.apache.org/jira/browse/ZOOKEEPER-3170 <https://issues.apache.org/jira/browse/ZOOKEEPER-3170>
>
> All previous flaky-related tickets have been converted to sub-tasks. Some of them might
not be up-to-date, please consider reviewing them and close if possible. Additionally feel
free to create new sub-tasks to capture your actual work.
>
> I’ve already modified Trunk and branch-3.5 builds to run on 4 threads for testing initially.
It resulted in slightly more stable tests:

+1

I have assigned the umbrella issue to you Andor as you are driving
this important task. is is ok ?

thank you

Enrico


>
> Trunk (java 8) - failing 1/4 (since #229) - build time increased by 40-45%
> Trunk (java 9) - failing 0/2 (since #993) - ~40%
> Trunk (java 10) - failing 1/2 (since #280) -
> branch-3.5 (java 8) - failing 0/4 (since #1153) - ~35-45%
>
> However the pattern is not big enough and results are inaccurate, so I need more builds.
I also need to fix a bug in SSL to get java9/10 builds working on 3.5.
>
> Please let me know if I should revert the changes. Precommit build is still running on
8 threads, but I’d like to change that one too.
>
> Regards,
> Andor
>
>
>
> > On 2018. Oct 15., at 9:31, Bogdan Kanivets <bkanivets@gmail.com> wrote:
> >
> > Fangmin,
> >
> > Those are good ideas.
> >
> > FYI, I've stated running tests continuously in aws m1.xlarge.
> > https://github.com/lavacat/zookeeper-tests-lab
> >
> > So far, I've done ~ 12 runs of trunk. Same common offenders as in Flaky
> > dash: testManyChildWatchersAutoReset, testPurgeWhenLogRollingInProgress
> > I'll do some more runs, then try to come up with report.
> >
> > I'm using aws and not Apache Jenkins env because of better
> > control/observability.
> >
> >
> >
> >
> > On Sun, Oct 14, 2018 at 4:58 PM Fangmin Lv <lvfangmin@gmail.com> wrote:
> >
> >> Internally, we also did some works to reduce the flaky, here are the main
> >> things we've done:
> >>
> >> * using retry rule to retry in case the zk client lost it's connection,
> >> this could happen if the quorum tests is running on unstable environment
> >> and the leader election happened.
> >> * using random port instead of sequentially to avoid the port racing when
> >> running tests concurrently
> >> * changing tests to avoid using the same test path when creating/deleting
> >> nodes
> >>
> >> These greatly reduced the flaky internally, we should try those if we're
> >> seeing similar issues in the Jenkins.
> >>
> >> Fangmin
> >>
> >> On Sat, Oct 13, 2018 at 10:48 AM Bogdan Kanivets <bkanivets@gmail.com>
> >> wrote:
> >>
> >>> I've looked into flakiness couple months ago (special attention on
> >>> testManyChildWatchersAutoReset). In my opinion the problem is a) and c).
> >>> Unfortunately I don't have data to back this claim.
> >>>
> >>> I don't remember seeing many 'port binding' exceptions. Unless 'port
> >>> assignment' issue manifested as some other exception.
> >>>
> >>> Before decreasing number of threads I think more data should be
> >>> collected/visualized
> >>>
> >>> 1) Flaky dashboard is great, but we should add another report that maps
> >>> 'error causes' to builds/tests
> >>> 2) Flaky dash can be extended to save more history (for example like this
> >>> https://www.chromium.org/developers/testing/flakiness-dashboard)
> >>> 3) PreCommit builds should be included in dashboard
> >>> 4) We should have a common clean benchmark. For example - take
> >>> AWS t3.xlarge instance with set linux distro, jvm, zk commit sha and run
> >>> tests (current 8 threads) for 8 hours with 1 min cooldown.
> >>>
> >>> Due to recent employment change, I got sidetracked, but I really want to
> >>> get to the bottom of this.
> >>> I'm going to setup 4) and report results to this mailing list. Also
> >> willing
> >>> to work on other items.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Sat, Oct 13, 2018 at 4:59 AM Enrico Olivelli <eolivelli@gmail.com>
> >>> wrote:
> >>>
> >>>> Il ven 12 ott 2018, 23:17 Benjamin Reed <breed@apache.org> ha
scritto:
> >>>>
> >>>>> i think the unique port assignment (d) is more problematic than
it
> >>>>> appears. there is a race between finding a free port and actually
> >>>>> grabbing it. i think that contributes to the flakiness.
> >>>>>
> >>>>
> >>>> This is very hard to solve for our test cases, because we need to build
> >>>> configs before starting the groups of servers.
> >>>> For tests in single server it will be easier, you just have to start
> >> the
> >>>> server on port zero, get the port and the create client configs.
> >>>> I don't know how much it will be worth
> >>>>
> >>>> Enrico
> >>>>
> >>>>
> >>>>> ben
> >>>>> On Fri, Oct 12, 2018 at 8:50 AM Andor Molnar <andor@apache.org>
> >> wrote:
> >>>>>>
> >>>>>> That is a completely valid point. I started to investigate flakies
> >>> for
> >>>>> exactly the same reason, if you remember the thread that I started
a
> >>>> while
> >>>>> ago. It was later abandoned unfortunately, because I’ve run into
a
> >> few
> >>>>> issues:
> >>>>>>
> >>>>>> - We nailed down that in order to release 3.5 stable, we have
to
> >> make
> >>>>> sure it’s not worse than 3.4 by comparing the builds: but these
> >> builds
> >>>> are
> >>>>> not comparable, because 3.4 tests running single threaded while
3.5
> >>>>> multithreaded showing problems which might also exist on 3.4,
> >>>>>>
> >>>>>> - Neither of them running C++ tests for some reason, but that’s
not
> >>>>> really an issue here,
> >>>>>>
> >>>>>> - Looks like tests on 3.5 is just as solid as on 3.4, because
> >> running
> >>>>> them on a dedicated, single threaded environment show almost all
> >> tests
> >>>>> succeeding,
> >>>>>>
> >>>>>> - I think the root cause of failing unit tests could be one
(or
> >> more)
> >>>> of
> >>>>> the following:
> >>>>>>        a) Environmental: Jenkins slave gets overloaded with
other
> >>>>> builds and multithreaded test running makes things even worse:
> >> starving
> >>>> JDK
> >>>>> threads and ZK instances (both clients and servers) are unable to
> >>> operate
> >>>>>>        b) Conceptional: ZK unit tests were not designed to run
on
> >>>>> multiple threads: I investigated the unique port assignment feature
> >>> which
> >>>>> is looking good, but there could be other possible gaps which makes
> >>> them
> >>>>> unreliable when running simultaneously.
> >>>>>>        c) Bad testing: testing ZK in the wrong way, making bad
> >>>>> assumption (e.g. not syncing clients), etc.
> >>>>>>        d) Bug in the server.
> >>>>>>
> >>>>>> I feel that finding case d) with these tests is super hard,
> >> because a
> >>>>> test report doesn’t give any information on what could go wrong
with
> >>>>> ZooKeeper. More or less guessing is your only option.
> >>>>>>
> >>>>>> Finding c) is a little bit easier, I’m trying to submit patches
on
> >>> them
> >>>>> and hopefully making some progress.
> >>>>>>
> >>>>>> The huge pain in the arse though are a) and b): people desperately
> >>> keep
> >>>>> commenting “please retest this” on github to get a green build
while
> >>>>> testing is going in a direction to hide real problems: I mean people
> >>>>> started not to care about a failing build, because “it must be
some
> >>> flaky
> >>>>> unrelated to my patch”. Which is bad, but the shame is it’s
true 90%
> >>>>> percent of cases.
> >>>>>>
> >>>>>> I’m just trying to find some ways - besides fixing c) and
d)
> >> flakies
> >>> -
> >>>>> to get more reliable and more informative Jenkins builds. Don’t
want
> >> to
> >>>>> make a huge turnaround, but I think if we can get a significantly
> >> more
> >>>>> reliable build for the price of slightly longer build time running
> >> on 4
> >>>>> threads instead of 8, I say let’s do it.
> >>>>>>
> >>>>>> As always, any help from the community is more than welcome
and
> >>>>> appreciated.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Andor
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On 2018. Oct 12., at 16:52, Patrick Hunt <phunt@apache.org>
> >> wrote:
> >>>>>>>
> >>>>>>> iirc the number of threads was increased to improve performance.
> >>>>> Reducing
> >>>>>>> is fine, but do we understand why it's failing? Perhaps
it's
> >>> finding
> >>>>> real
> >>>>>>> issues as a result of the artificial concurrency/load.
> >>>>>>>
> >>>>>>> Patrick
> >>>>>>>
> >>>>>>> On Fri, Oct 12, 2018 at 7:12 AM Andor Molnar
> >>>>> <andor@cloudera.com.invalid>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for the feedback.
> >>>>>>>> I'm running a few tests now: branch-3.5 on 2 threads
and trunk
> >> on
> >>> 4
> >>>>> threads
> >>>>>>>> to see what's the impact on the build time.
> >>>>>>>>
> >>>>>>>> Github PR job is hard to configure, because its settings
are
> >> hard
> >>>>> coded
> >>>>>>>> into a shell script in the codebase. I have to open
PR for that.
> >>>>>>>>
> >>>>>>>> Andor
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Oct 12, 2018 at 2:46 PM, Norbert Kalmar <
> >>>>>>>> nkalmar@cloudera.com.invalid> wrote:
> >>>>>>>>
> >>>>>>>>> +1, running the tests locally with 1 thread always
passes
> >> (well,
> >>> I
> >>>>> run it
> >>>>>>>>> about 5 times, but still)
> >>>>>>>>> On the other hand, running it on 8 threads yields
similarly
> >> flaky
> >>>>> results
> >>>>>>>>> as Apache runs. (Although it is much faster, but
if we have to
> >>> run
> >>>>> 6-8-10
> >>>>>>>>> times sometimes to get a green run...)
> >>>>>>>>>
> >>>>>>>>> Norbert
> >>>>>>>>>
> >>>>>>>>> On Fri, Oct 12, 2018 at 2:05 PM Enrico Olivelli
<
> >>>> eolivelli@gmail.com
> >>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> +1
> >>>>>>>>>>
> >>>>>>>>>> Enrico
> >>>>>>>>>>
> >>>>>>>>>> Il ven 12 ott 2018, 13:52 Andor Molnar <andor@apache.org>
ha
> >>>>> scritto:
> >>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> What do you think of changing number of
threads running unit
> >>>> tests
> >>>>> in
> >>>>>>>>>>> Jenkins from current 8 to 4 or even 2?
> >>>>>>>>>>>
> >>>>>>>>>>> Running unit tests inside Cloudera environment
on a single
> >>> thread
> >>>>>>>> shows
> >>>>>>>>>> the
> >>>>>>>>>>> builds much more stable. That would be probably
too slow, but
> >>>> maybe
> >>>>>>>>>> running
> >>>>>>>>>>> at least less threads would improve the
situation.
> >>>>>>>>>>>
> >>>>>>>>>>> It's getting very annoying that I cannot
get a green build on
> >>>>> GitHub
> >>>>>>>>> with
> >>>>>>>>>>> only a few retests.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Andor
> >>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> -- Enrico Olivelli
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>> --
> >>>>
> >>>>
> >>>> -- Enrico Olivelli
> >>>>
> >>>
> >>
>

Mime
View raw message