zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enrico Olivelli <eolive...@gmail.com>
Subject Re: Decrease number of threads in Jenkins builds to reduce flakyness
Date Sat, 13 Oct 2018 11:59:10 GMT
Il ven 12 ott 2018, 23:17 Benjamin Reed <breed@apache.org> ha scritto:

> i think the unique port assignment (d) is more problematic than it
> appears. there is a race between finding a free port and actually
> grabbing it. i think that contributes to the flakiness.
>

This is very hard to solve for our test cases, because we need to build
configs before starting the groups of servers.
For tests in single server it will be easier, you just have to start the
server on port zero, get the port and the create client configs.
I don't know how much it will be worth

Enrico


> ben
> On Fri, Oct 12, 2018 at 8:50 AM Andor Molnar <andor@apache.org> wrote:
> >
> > That is a completely valid point. I started to investigate flakies for
> exactly the same reason, if you remember the thread that I started a while
> ago. It was later abandoned unfortunately, because I’ve run into a few
> issues:
> >
> > - We nailed down that in order to release 3.5 stable, we have to make
> sure it’s not worse than 3.4 by comparing the builds: but these builds are
> not comparable, because 3.4 tests running single threaded while 3.5
> multithreaded showing problems which might also exist on 3.4,
> >
> > - Neither of them running C++ tests for some reason, but that’s not
> really an issue here,
> >
> > - Looks like tests on 3.5 is just as solid as on 3.4, because running
> them on a dedicated, single threaded environment show almost all tests
> succeeding,
> >
> > - I think the root cause of failing unit tests could be one (or more) of
> the following:
> >         a) Environmental: Jenkins slave gets overloaded with other
> builds and multithreaded test running makes things even worse: starving JDK
> threads and ZK instances (both clients and servers) are unable to operate
> >         b) Conceptional: ZK unit tests were not designed to run on
> multiple threads: I investigated the unique port assignment feature which
> is looking good, but there could be other possible gaps which makes them
> unreliable when running simultaneously.
> >         c) Bad testing: testing ZK in the wrong way, making bad
> assumption (e.g. not syncing clients), etc.
> >         d) Bug in the server.
> >
> > I feel that finding case d) with these tests is super hard, because a
> test report doesn’t give any information on what could go wrong with
> ZooKeeper. More or less guessing is your only option.
> >
> > Finding c) is a little bit easier, I’m trying to submit patches on them
> and hopefully making some progress.
> >
> > The huge pain in the arse though are a) and b): people desperately keep
> commenting “please retest this” on github to get a green build while
> testing is going in a direction to hide real problems: I mean people
> started not to care about a failing build, because “it must be some flaky
> unrelated to my patch”. Which is bad, but the shame is it’s true 90%
> percent of cases.
> >
> > I’m just trying to find some ways - besides fixing c) and d) flakies -
> to get more reliable and more informative Jenkins builds. Don’t want to
> make a huge turnaround, but I think if we can get a significantly more
> reliable build for the price of slightly longer build time running on 4
> threads instead of 8, I say let’s do it.
> >
> > As always, any help from the community is more than welcome and
> appreciated.
> >
> > Thanks,
> > Andor
> >
> >
> >
> >
> > > On 2018. Oct 12., at 16:52, Patrick Hunt <phunt@apache.org> wrote:
> > >
> > > iirc the number of threads was increased to improve performance.
> Reducing
> > > is fine, but do we understand why it's failing? Perhaps it's finding
> real
> > > issues as a result of the artificial concurrency/load.
> > >
> > > Patrick
> > >
> > > On Fri, Oct 12, 2018 at 7:12 AM Andor Molnar
> <andor@cloudera.com.invalid>
> > > wrote:
> > >
> > >> Thanks for the feedback.
> > >> I'm running a few tests now: branch-3.5 on 2 threads and trunk on 4
> threads
> > >> to see what's the impact on the build time.
> > >>
> > >> Github PR job is hard to configure, because its settings are hard
> coded
> > >> into a shell script in the codebase. I have to open PR for that.
> > >>
> > >> Andor
> > >>
> > >>
> > >>
> > >> On Fri, Oct 12, 2018 at 2:46 PM, Norbert Kalmar <
> > >> nkalmar@cloudera.com.invalid> wrote:
> > >>
> > >>> +1, running the tests locally with 1 thread always passes (well, I
> run it
> > >>> about 5 times, but still)
> > >>> On the other hand, running it on 8 threads yields similarly flaky
> results
> > >>> as Apache runs. (Although it is much faster, but if we have to run
> 6-8-10
> > >>> times sometimes to get a green run...)
> > >>>
> > >>> Norbert
> > >>>
> > >>> On Fri, Oct 12, 2018 at 2:05 PM Enrico Olivelli <eolivelli@gmail.com
> >
> > >>> wrote:
> > >>>
> > >>>> +1
> > >>>>
> > >>>> Enrico
> > >>>>
> > >>>> Il ven 12 ott 2018, 13:52 Andor Molnar <andor@apache.org>
ha
> scritto:
> > >>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> What do you think of changing number of threads running unit
tests
> in
> > >>>>> Jenkins from current 8 to 4 or even 2?
> > >>>>>
> > >>>>> Running unit tests inside Cloudera environment on a single
thread
> > >> shows
> > >>>> the
> > >>>>> builds much more stable. That would be probably too slow, but
maybe
> > >>>> running
> > >>>>> at least less threads would improve the situation.
> > >>>>>
> > >>>>> It's getting very annoying that I cannot get a green build
on
> GitHub
> > >>> with
> > >>>>> only a few retests.
> > >>>>>
> > >>>>> Regards,
> > >>>>> Andor
> > >>>>>
> > >>>> --
> > >>>>
> > >>>>
> > >>>> -- Enrico Olivelli
> > >>>>
> > >>>
> > >>
> >
>
-- 


-- Enrico Olivelli

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message