couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: Getting automated builds back on track
Date Mon, 29 Jul 2019 15:00:27 GMT
I went through and increased timeouts as needed to get the EUnit suite to pass on the current
Jenkins setup:

https://github.com/apache/couchdb/pull/2087 <https://github.com/apache/couchdb/pull/2087>

Happy to do it again on emulated workers. We aren’t green yet; Jenkins is now complaining
about some timeouts in the ExUnit tests. I haven’t looked into those yet, other than to
run them locally where I also encountered a test failure (not timeout-related, possibly an
eventual consistency thing). Do we want those tests to be blocking the build at this point?

Adam

> On Jul 27, 2019, at 12:43 AM, Joan Touzet <wohali@apache.org> wrote:
> 
> Actually, I never commited that change. We're still on actual ARM hardware for the ARM
build, and the couch_btree tests time out on that platform.
> 
> https://github.com/apache/couchdb/blob/master/Jenkinsfile#L322
> 
> -Joan
> 
> On 2019-07-26 7:23 p.m., Adam Kocoloski wrote:
>> Great email.
>> As a tactical step, does it make sense to back out the qemu-based builds from the
main pipeline while we work on the timeout issues?
>> Adam
>>> On Jul 26, 2019, at 5:29 PM, Joan Touzet <wohali@apache.org> wrote:
>>> 
>>> Hello again,
>>> 
>>> Adam poked me on IRC today asking a few questions about the state of Jenkins,
and why we're not gnerating test binaries for download.
>>> 
>>> The reason is simple: the tests are failing.
>>> 
>>> I've discussed this topic before twice at length with little feedback:
>>> 
>>> https://lists.apache.org/thread.html/6e2bedbbf5c2b28af4237d0936dc21f056fdafa2ea0c0b457285b9dc@%3Cdev.couchdb.apache.org%3E
>>> 
>>> https://lists.apache.org/thread.html/16a310e3342d3f1ca73fb85f62829b76bbfa3759e418386b07e2827f@%3Cdev.couchdb.apache.org%3E
>>> 
>>> 
>>> I have 4 specific proposals to get us back on track:
>>> 
>>>  1. Get more targeted build workers for ppc64le and aarch64 platforms.
>>> 
>>>     This is critical while we wait for #4 below. By having >1 hardware
>>>     platform to build on for each of these, we can hopefully pass those
>>>     architectures regularly, and start building real downloads and Docker
>>>     images for each of these. I know the user community really wants this.
>>> 
>>>     If we get at least 2 of each worker, I'll change Jenkinsfile to use
>>>     those tagged workers rather than the qemu emulation we currently
>>>     have (and is failing).
>>> 
>>> 
>>>  2. Receive and provision the new CouchDB Jenkins build machine. IBM is
>>>     being very generous in getting this set up, and Paul Davis mentioned
>>>     the machine should be ready in the very near future.
>>> 
>>>     Provisioning will have to include Docker + the qemu support. See
>>>     https://issues.apache.org/jira/browse/INFRA-18322 for details on that
>>>     and https://issues.apache.org/jira/browse/INFRA-17404 for the general
>>>     provisioning approach (we download Jenkins .jar from the ASF machine,
>>>     set it up to be `runit`-run on boot, run as many as we can on the
>>>     machine (I think the HW was selected to run 8 of these at once),
>>>     install the prerequisites, and request the 8x worker+password infos
>>>     from ASF Infra.
>>> 
>>>     We have a choice: do we set this up just as 8x Jenkins workers, or do
>>>     we also start running our own Jenkins master (potentially on
>>>     couchdb-vm2)? The motivation to do the latter would be to add
>>>     credentials that could be used for automatic uploading of binaries to
>>>     places like bintray and Docker. (I am currently engaged with Infra in
>>>     trying to solve this for many projects, including Apache OpenWhisk.
>>>     One of the major limiting factors is that the shared ASF Jenkins
>>>     master's credentials can be accessed by all users on the server. This
>>>     is obviously a security nightmare.)
>>> 
>>>     At the moment, we are "OK" using the ASF Jenkins master instance. But
>>>     as soon as we start depending on this service widely (see below) it'll
>>>     be very disruptive to take it down, even for a day or two. So it may
>>>     be best to make this decision sooner rather than later.
>>> 
>>>     I'll be in touch with Infra next week on the global "automated
>>>     binary builds" issue, and will ask for guidance at that time.
>>> 
>>>  3. Switch our PR gate on GitHub from Travis CI to Jenkins CI. This way,
>>>     people won't be blocked on PRs waiting forever anymore, since we'll
>>>     have a lot of compute resources at our disposal. That said,
>>>     **PEOPLE HAVE TO START FIXING THE INTERMITTENT TEST CASE FAILURES**
>>>     or we'll be right back to "Hey, it didn't pass...I'll just click
>>>     Retry" again. 😒 🤢  This will have to be a team effort.
>>> 
>>>  4. Get rid of all timeouts in all test cases. A few proposals for this
>>>     were made in the context of ExUnit. Can we get some more progress
>>>     here?
>>> 
>>>     https://github.com/apache/couchdb/issues/2030
>>>     https://github.com/apache/couchdb/pull/2039
>>> 
>>>  5. Once 4 is done, we can consider moving aarch64/ppc64le/other binary
>>>     builds to qemu support, meaning we can test all platforms just on
>>>     simple x86_64 machines. It's not a required move, but if we lose
>>>     access to the other platforms, or they go down, it's a backup
>>>     strategy.
>>> 
>>> What do people think?


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message