zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@apache.org>
Subject Re: Decrease number of threads in Jenkins builds to reduce flakyness
Date Fri, 12 Oct 2018 21:17:07 GMT
i think the unique port assignment (d) is more problematic than it
appears. there is a race between finding a free port and actually
grabbing it. i think that contributes to the flakiness.

ben
On Fri, Oct 12, 2018 at 8:50 AM Andor Molnar <andor@apache.org> wrote:
>
> That is a completely valid point. I started to investigate flakies for exactly the same
reason, if you remember the thread that I started a while ago. It was later abandoned unfortunately,
because I’ve run into a few issues:
>
> - We nailed down that in order to release 3.5 stable, we have to make sure it’s not
worse than 3.4 by comparing the builds: but these builds are not comparable, because 3.4 tests
running single threaded while 3.5 multithreaded showing problems which might also exist on
3.4,
>
> - Neither of them running C++ tests for some reason, but that’s not really an issue
here,
>
> - Looks like tests on 3.5 is just as solid as on 3.4, because running them on a dedicated,
single threaded environment show almost all tests succeeding,
>
> - I think the root cause of failing unit tests could be one (or more) of the following:
>         a) Environmental: Jenkins slave gets overloaded with other builds and multithreaded
test running makes things even worse: starving JDK threads and ZK instances (both clients
and servers) are unable to operate
>         b) Conceptional: ZK unit tests were not designed to run on multiple threads:
I investigated the unique port assignment feature which is looking good, but there could be
other possible gaps which makes them unreliable when running simultaneously.
>         c) Bad testing: testing ZK in the wrong way, making bad assumption (e.g. not
syncing clients), etc.
>         d) Bug in the server.
>
> I feel that finding case d) with these tests is super hard, because a test report doesn’t
give any information on what could go wrong with ZooKeeper. More or less guessing is your
only option.
>
> Finding c) is a little bit easier, I’m trying to submit patches on them and hopefully
making some progress.
>
> The huge pain in the arse though are a) and b): people desperately keep commenting “please
retest this” on github to get a green build while testing is going in a direction to hide
real problems: I mean people started not to care about a failing build, because “it must
be some flaky unrelated to my patch”. Which is bad, but the shame is it’s true 90% percent
of cases.
>
> I’m just trying to find some ways - besides fixing c) and d) flakies - to get more
reliable and more informative Jenkins builds. Don’t want to make a huge turnaround, but
I think if we can get a significantly more reliable build for the price of slightly longer
build time running on 4 threads instead of 8, I say let’s do it.
>
> As always, any help from the community is more than welcome and appreciated.
>
> Thanks,
> Andor
>
>
>
>
> > On 2018. Oct 12., at 16:52, Patrick Hunt <phunt@apache.org> wrote:
> >
> > iirc the number of threads was increased to improve performance. Reducing
> > is fine, but do we understand why it's failing? Perhaps it's finding real
> > issues as a result of the artificial concurrency/load.
> >
> > Patrick
> >
> > On Fri, Oct 12, 2018 at 7:12 AM Andor Molnar <andor@cloudera.com.invalid>
> > wrote:
> >
> >> Thanks for the feedback.
> >> I'm running a few tests now: branch-3.5 on 2 threads and trunk on 4 threads
> >> to see what's the impact on the build time.
> >>
> >> Github PR job is hard to configure, because its settings are hard coded
> >> into a shell script in the codebase. I have to open PR for that.
> >>
> >> Andor
> >>
> >>
> >>
> >> On Fri, Oct 12, 2018 at 2:46 PM, Norbert Kalmar <
> >> nkalmar@cloudera.com.invalid> wrote:
> >>
> >>> +1, running the tests locally with 1 thread always passes (well, I run it
> >>> about 5 times, but still)
> >>> On the other hand, running it on 8 threads yields similarly flaky results
> >>> as Apache runs. (Although it is much faster, but if we have to run 6-8-10
> >>> times sometimes to get a green run...)
> >>>
> >>> Norbert
> >>>
> >>> On Fri, Oct 12, 2018 at 2:05 PM Enrico Olivelli <eolivelli@gmail.com>
> >>> wrote:
> >>>
> >>>> +1
> >>>>
> >>>> Enrico
> >>>>
> >>>> Il ven 12 ott 2018, 13:52 Andor Molnar <andor@apache.org> ha scritto:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> What do you think of changing number of threads running unit tests
in
> >>>>> Jenkins from current 8 to 4 or even 2?
> >>>>>
> >>>>> Running unit tests inside Cloudera environment on a single thread
> >> shows
> >>>> the
> >>>>> builds much more stable. That would be probably too slow, but maybe
> >>>> running
> >>>>> at least less threads would improve the situation.
> >>>>>
> >>>>> It's getting very annoying that I cannot get a green build on GitHub
> >>> with
> >>>>> only a few retests.
> >>>>>
> >>>>> Regards,
> >>>>> Andor
> >>>>>
> >>>> --
> >>>>
> >>>>
> >>>> -- Enrico Olivelli
> >>>>
> >>>
> >>
>

Mime
View raw message