zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andor Molnár <an...@apache.org>
Subject Re: Decrease number of threads in Jenkins builds to reduce flakyness
Date Mon, 22 Oct 2018 16:35:15 GMT
Thanks Bogdan, so far so good.

testNodeDataChanged is an old beast, I've a possible fix for that from
@afine:

https://github.com/apache/zookeeper/pull/300

Would be great if we could review it and get rid of this flaky.


Andor




On 10/20/18 06:41, Bogdan Kanivets wrote:
> I think the argument for keeping concurrency is that it may manifest some
> unknown problems with the code.
>
> Maybe a middle ground  - move largest offenders into separate junit tag and
> run them after rest of the test with threads=1. Hopefully this will make
> life better for PRs.
>
> On the note of largest offenders, I've done 44 runs on aws r3.large with
> various thread settings (1, 2, 4, 8).
> Failure counts:
>       1 testNextConfigAlreadyActive
>       1 testNonExistingOpCode
>       1 testRaceConditionBetweenLeaderAndAckRequestProcessor
>       1 testWatcherDisconnectOnClose
>       2 testDoubleElection
>       5 testCurrentServersAreObserversInNextConfig
>       5 testNormalFollowerRunWithDiff
>       7 startSingleServerTest
>      18 testNodeDataChanged
>
> Haven't seen testPurgeWhenLogRollingInProgress
> or testManyChildWatchersAutoReset failing yet.
>
>
>
> On Thu, Oct 18, 2018 at 10:03 PM Michael Han <hanm@apache.org> wrote:
>
>> It's a good idea to reduce the concurrency of to eliminate flakyness. Looks
>> like single threaded unit tests on trunk is pretty stable
>> https://builds.apache.org/job/zookeeper-trunk-single-thread/ (some
>> failures
>> are due to C tests). The build time is longer, but not too bad (for
>> pre-commit build, for nightly build, build time should not be a concern at
>> all).
>>
>>
>> On Mon, Oct 15, 2018 at 5:50 AM Andor Molnar <andor@cloudera.com.invalid>
>> wrote:
>>
>>> +1
>>>
>>>
>>>
>>> On Mon, Oct 15, 2018 at 1:55 PM, Enrico Olivelli <eolivelli@gmail.com>
>>> wrote:
>>>
>>>> Il giorno lun 15 ott 2018 alle ore 12:46 Andor Molnar
>>>> <andor@apache.org> ha scritto:
>>>>> Thank you guys. This is great help.
>>>>>
>>>>> I remember your efforts Bogdan, as far as I remember you observer
>>> thread
>>>> starvation in multiple runs on Apache Jenkins. Correct my if I’m wrong.
>>>>> I’ve created an umbrella Jira to capture all flaky test fixing
>> efforts
>>>> here:
>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-3170 <
>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-3170>
>>>>> All previous flaky-related tickets have been converted to sub-tasks.
>>>> Some of them might not be up-to-date, please consider reviewing them
>> and
>>>> close if possible. Additionally feel free to create new sub-tasks to
>>>> capture your actual work.
>>>>> I’ve already modified Trunk and branch-3.5 builds to run on 4 threads
>>>> for testing initially. It resulted in slightly more stable tests:
>>>>
>>>> +1
>>>>
>>>> I have assigned the umbrella issue to you Andor as you are driving
>>>> this important task. is is ok ?
>>>>
>>>> thank you
>>>>
>>>> Enrico
>>>>
>>>>
>>>>> Trunk (java 8) - failing 1/4 (since #229) - build time increased by
>>>> 40-45%
>>>>> Trunk (java 9) - failing 0/2 (since #993) - ~40%
>>>>> Trunk (java 10) - failing 1/2 (since #280) -
>>>>> branch-3.5 (java 8) - failing 0/4 (since #1153) - ~35-45%
>>>>>
>>>>> However the pattern is not big enough and results are inaccurate, so
>> I
>>>> need more builds. I also need to fix a bug in SSL to get java9/10
>> builds
>>>> working on 3.5.
>>>>> Please let me know if I should revert the changes. Precommit build is
>>>> still running on 8 threads, but I’d like to change that one too.
>>>>> Regards,
>>>>> Andor
>>>>>
>>>>>
>>>>>
>>>>>> On 2018. Oct 15., at 9:31, Bogdan Kanivets <bkanivets@gmail.com>
>>>> wrote:
>>>>>> Fangmin,
>>>>>>
>>>>>> Those are good ideas.
>>>>>>
>>>>>> FYI, I've stated running tests continuously in aws m1.xlarge.
>>>>>> https://github.com/lavacat/zookeeper-tests-lab
>>>>>>
>>>>>> So far, I've done ~ 12 runs of trunk. Same common offenders as in
>>> Flaky
>>>>>> dash: testManyChildWatchersAutoReset,
>> testPurgeWhenLogRollingInProgr
>>>> ess
>>>>>> I'll do some more runs, then try to come up with report.
>>>>>>
>>>>>> I'm using aws and not Apache Jenkins env because of better
>>>>>> control/observability.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Oct 14, 2018 at 4:58 PM Fangmin Lv <lvfangmin@gmail.com>
>>>> wrote:
>>>>>>> Internally, we also did some works to reduce the flaky, here
are
>> the
>>>> main
>>>>>>> things we've done:
>>>>>>>
>>>>>>> * using retry rule to retry in case the zk client lost it's
>>>> connection,
>>>>>>> this could happen if the quorum tests is running on unstable
>>>> environment
>>>>>>> and the leader election happened.
>>>>>>> * using random port instead of sequentially to avoid the port
>> racing
>>>> when
>>>>>>> running tests concurrently
>>>>>>> * changing tests to avoid using the same test path when
>>>> creating/deleting
>>>>>>> nodes
>>>>>>>
>>>>>>> These greatly reduced the flaky internally, we should try those
if
>>>> we're
>>>>>>> seeing similar issues in the Jenkins.
>>>>>>>
>>>>>>> Fangmin
>>>>>>>
>>>>>>> On Sat, Oct 13, 2018 at 10:48 AM Bogdan Kanivets <
>>> bkanivets@gmail.com
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I've looked into flakiness couple months ago (special attention
>> on
>>>>>>>> testManyChildWatchersAutoReset). In my opinion the problem
is a)
>>>> and c).
>>>>>>>> Unfortunately I don't have data to back this claim.
>>>>>>>>
>>>>>>>> I don't remember seeing many 'port binding' exceptions. Unless
>>> 'port
>>>>>>>> assignment' issue manifested as some other exception.
>>>>>>>>
>>>>>>>> Before decreasing number of threads I think more data should
be
>>>>>>>> collected/visualized
>>>>>>>>
>>>>>>>> 1) Flaky dashboard is great, but we should add another report
>> that
>>>> maps
>>>>>>>> 'error causes' to builds/tests
>>>>>>>> 2) Flaky dash can be extended to save more history (for example
>>> like
>>>> this
>>>>>>>> https://www.chromium.org/developers/testing/flakiness-dashboard)
>>>>>>>> 3) PreCommit builds should be included in dashboard
>>>>>>>> 4) We should have a common clean benchmark. For example -
take
>>>>>>>> AWS t3.xlarge instance with set linux distro, jvm, zk commit
sha
>>> and
>>>> run
>>>>>>>> tests (current 8 threads) for 8 hours with 1 min cooldown.
>>>>>>>>
>>>>>>>> Due to recent employment change, I got sidetracked, but I
really
>>>> want to
>>>>>>>> get to the bottom of this.
>>>>>>>> I'm going to setup 4) and report results to this mailing
list.
>> Also
>>>>>>> willing
>>>>>>>> to work on other items.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Oct 13, 2018 at 4:59 AM Enrico Olivelli <
>>> eolivelli@gmail.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Il ven 12 ott 2018, 23:17 Benjamin Reed <breed@apache.org>
ha
>>>> scritto:
>>>>>>>>>> i think the unique port assignment (d) is more problematic
than
>>> it
>>>>>>>>>> appears. there is a race between finding a free port
and
>> actually
>>>>>>>>>> grabbing it. i think that contributes to the flakiness.
>>>>>>>>>>
>>>>>>>>> This is very hard to solve for our test cases, because
we need
>> to
>>>> build
>>>>>>>>> configs before starting the groups of servers.
>>>>>>>>> For tests in single server it will be easier, you just
have to
>>> start
>>>>>>> the
>>>>>>>>> server on port zero, get the port and the create client
configs.
>>>>>>>>> I don't know how much it will be worth
>>>>>>>>>
>>>>>>>>> Enrico
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ben
>>>>>>>>>> On Fri, Oct 12, 2018 at 8:50 AM Andor Molnar <andor@apache.org
>>>>>>> wrote:
>>>>>>>>>>> That is a completely valid point. I started to
investigate
>>> flakies
>>>>>>>> for
>>>>>>>>>> exactly the same reason, if you remember the thread
that I
>>> started
>>>> a
>>>>>>>>> while
>>>>>>>>>> ago. It was later abandoned unfortunately, because
I’ve run
>> into
>>> a
>>>>>>> few
>>>>>>>>>> issues:
>>>>>>>>>>> - We nailed down that in order to release 3.5
stable, we have
>> to
>>>>>>> make
>>>>>>>>>> sure it’s not worse than 3.4 by comparing the builds:
but these
>>>>>>> builds
>>>>>>>>> are
>>>>>>>>>> not comparable, because 3.4 tests running single
threaded while
>>> 3.5
>>>>>>>>>> multithreaded showing problems which might also exist
on 3.4,
>>>>>>>>>>> - Neither of them running C++ tests for some
reason, but
>> that’s
>>>> not
>>>>>>>>>> really an issue here,
>>>>>>>>>>> - Looks like tests on 3.5 is just as solid as
on 3.4, because
>>>>>>> running
>>>>>>>>>> them on a dedicated, single threaded environment
show almost
>> all
>>>>>>> tests
>>>>>>>>>> succeeding,
>>>>>>>>>>> - I think the root cause of failing unit tests
could be one
>> (or
>>>>>>> more)
>>>>>>>>> of
>>>>>>>>>> the following:
>>>>>>>>>>>        a) Environmental: Jenkins slave gets overloaded
with
>>> other
>>>>>>>>>> builds and multithreaded test running makes things
even worse:
>>>>>>> starving
>>>>>>>>> JDK
>>>>>>>>>> threads and ZK instances (both clients and servers)
are unable
>> to
>>>>>>>> operate
>>>>>>>>>>>        b) Conceptional: ZK unit tests were not
designed to run
>>> on
>>>>>>>>>> multiple threads: I investigated the unique port
assignment
>>> feature
>>>>>>>> which
>>>>>>>>>> is looking good, but there could be other possible
gaps which
>>> makes
>>>>>>>> them
>>>>>>>>>> unreliable when running simultaneously.
>>>>>>>>>>>        c) Bad testing: testing ZK in the wrong
way, making bad
>>>>>>>>>> assumption (e.g. not syncing clients), etc.
>>>>>>>>>>>        d) Bug in the server.
>>>>>>>>>>>
>>>>>>>>>>> I feel that finding case d) with these tests
is super hard,
>>>>>>> because a
>>>>>>>>>> test report doesn’t give any information on what
could go wrong
>>>> with
>>>>>>>>>> ZooKeeper. More or less guessing is your only option.
>>>>>>>>>>> Finding c) is a little bit easier, I’m trying
to submit
>> patches
>>> on
>>>>>>>> them
>>>>>>>>>> and hopefully making some progress.
>>>>>>>>>>> The huge pain in the arse though are a) and b):
people
>>> desperately
>>>>>>>> keep
>>>>>>>>>> commenting “please retest this” on github to
get a green build
>>>> while
>>>>>>>>>> testing is going in a direction to hide real problems:
I mean
>>>> people
>>>>>>>>>> started not to care about a failing build, because
“it must be
>>> some
>>>>>>>> flaky
>>>>>>>>>> unrelated to my patch”. Which is bad, but the shame
is it’s
>> true
>>>> 90%
>>>>>>>>>> percent of cases.
>>>>>>>>>>> I’m just trying to find some ways - besides
fixing c) and d)
>>>>>>> flakies
>>>>>>>> -
>>>>>>>>>> to get more reliable and more informative Jenkins
builds. Don’t
>>>> want
>>>>>>> to
>>>>>>>>>> make a huge turnaround, but I think if we can get
a
>> significantly
>>>>>>> more
>>>>>>>>>> reliable build for the price of slightly longer build
time
>>> running
>>>>>>> on 4
>>>>>>>>>> threads instead of 8, I say let’s do it.
>>>>>>>>>>> As always, any help from the community is more
than welcome
>> and
>>>>>>>>>> appreciated.
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Andor
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On 2018. Oct 12., at 16:52, Patrick Hunt
<phunt@apache.org>
>>>>>>> wrote:
>>>>>>>>>>>> iirc the number of threads was increased
to improve
>>> performance.
>>>>>>>>>> Reducing
>>>>>>>>>>>> is fine, but do we understand why it's failing?
Perhaps it's
>>>>>>>> finding
>>>>>>>>>> real
>>>>>>>>>>>> issues as a result of the artificial concurrency/load.
>>>>>>>>>>>>
>>>>>>>>>>>> Patrick
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 12, 2018 at 7:12 AM Andor Molnar
>>>>>>>>>> <andor@cloudera.com.invalid>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the feedback.
>>>>>>>>>>>>> I'm running a few tests now: branch-3.5
on 2 threads and
>> trunk
>>>>>>> on
>>>>>>>> 4
>>>>>>>>>> threads
>>>>>>>>>>>>> to see what's the impact on the build
time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Github PR job is hard to configure, because
its settings are
>>>>>>> hard
>>>>>>>>>> coded
>>>>>>>>>>>>> into a shell script in the codebase.
I have to open PR for
>>> that.
>>>>>>>>>>>>> Andor
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 2:46 PM, Norbert
Kalmar <
>>>>>>>>>>>>> nkalmar@cloudera.com.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1, running the tests locally with
1 thread always passes
>>>>>>> (well,
>>>>>>>> I
>>>>>>>>>> run it
>>>>>>>>>>>>>> about 5 times, but still)
>>>>>>>>>>>>>> On the other hand, running it on
8 threads yields similarly
>>>>>>> flaky
>>>>>>>>>> results
>>>>>>>>>>>>>> as Apache runs. (Although it is much
faster, but if we have
>>> to
>>>>>>>> run
>>>>>>>>>> 6-8-10
>>>>>>>>>>>>>> times sometimes to get a green run...)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Norbert
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 2:05 PM Enrico
Olivelli <
>>>>>>>>> eolivelli@gmail.com
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Enrico
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Il ven 12 ott 2018, 13:52 Andor
Molnar <andor@apache.org>
>>> ha
>>>>>>>>>> scritto:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What do you think of changing
number of threads running
>>> unit
>>>>>>>>> tests
>>>>>>>>>> in
>>>>>>>>>>>>>>>> Jenkins from current 8 to
4 or even 2?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Running unit tests inside
Cloudera environment on a
>> single
>>>>>>>> thread
>>>>>>>>>>>>> shows
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> builds much more stable.
That would be probably too slow,
>>> but
>>>>>>>>> maybe
>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>> at least less threads would
improve the situation.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It's getting very annoying
that I cannot get a green
>> build
>>> on
>>>>>>>>>> GitHub
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> only a few retests.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Andor
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- Enrico Olivelli
>>>>>>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- Enrico Olivelli
>>>>>>>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message