couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: 2.1
Date Sun, 21 May 2017 16:04:07 GMT
So far the predominant cause has been timeouts not being long enough
and both changes in Travis and also our new Jenkins setup bringing
these to light. That means the tests do what they are supposed to,
just sometimes not on very slow build VMs, so things need to be
adjusted.


> On 21. May 2017, at 15:05, Jonathan Hall <flimzy@flimzy.com> wrote:
> 
> I tend to agree with the sentiment expressed here. If failing tests are disabled, I have
to wonder about the value of the tests in the first place. 
> 
> 
> On May 21, 2017 4:17:47 AM GMT+02:00, "Eli Stevens (Gmail)" <wickedgrey@gmail.com>
wrote:
>> Please take this as a single data point from an end user:
>> 
>> This approach will probably result in my company declining to upgrade
>> to CouchDB 2.1, and instead waiting for 2.2 in hopes that the test
>> suite will be in a more stable state by then. This is somewhat ironic,
>> given that my company is also sponsoring the work to have Ubuntu
>> packages produced automatically for builds that have a passing test
>> suite.
>> 
>> I realize that this might be somewhat nonsensical, given that I don't
>> have a good handle on what the testing situation was surrounding 2.0,
>> but our expectation is that going from 1.6.1 to 2.0 should be a
>> support and stability improvement, which is important for us. Right
>> now the biggest source of test failures for my company's product is
>> CouchDB 1.6.1 instability. We've build up a layer of retry/backoff
>> functionality over our couch library, but it still leaks out
>> sometimes.
>> 
>> So we're cautiously optimistic about transitioning to 2.0, but I'm
>> really unenthusiastic about a release process that treats intermittent
>> errors as the responsibility of the end user to mitigate. I'd much
>> rather see master made stable (why did the replication scheduler land
>> if it wasn't release-ready?) and that ship as 2.1, even if it takes
>> longer to get there.
>> 
>> Again, just input from an end user.
>> 
>> Thanks for considering,
>> Eli
>> 
>> 
>> 
>> On Sun, May 14, 2017 at 12:21 AM, Jan Lehnardt <jan@apache.org> wrote:
>>>> *just* for the 2.1 branch
>>> 
>>> Absolutely, just for that branch, master will keep all failing tests
>> until
>>> we sort them out proper.
>>> 
>>> Thanks Paul for elaborating here, that’s precisely my thinking as
>> well.
>>> 
>>> Joan, thanks for highlighting that “just disabling all failing tests”
>> won’t
>>> do (e.g. in case of couchjs sometimes crashing), we’ll continue to
>> have to
>>> live with that until we find out what’s wrong.
>>> 
>>> I was mainly thinking about the randomly failing compaction daemon
>> type
>>> tests.
>>> 
>>> Best
>>> Jan
>>> --
>>> 
>>> 
>>>> On 14. May 2017, at 05:46, Paul Davis <paul.joseph.davis@gmail.com>
>> wrote:
>>>> 
>>>> Joan,
>>>> 
>>>> Reading this while on ops but my understanding was that the
>> disabling
>>>> was *just* for the 2.1 branch. Other than that I agree 100%. Other
>>>> than wondering why you haven't merged the log upload :P Thats aweome
>>>> and I agree will help significantly. And I agree that the tests
>> aren't
>>>> necessarily bad its just that with a distributed/async system the
>>>> whole "works on my machine" turns into a "works on all developer
>>>> machines" but then also "blows up on way under powered VMs" which
>>>> means our tests have some fun timing issues.
>>>> 
>>>> Given that the tests are randomly failing vs a test or two that's
>>>> always failing I'm not that concerned with just flagging the issue
>> as
>>>> "We're aware of it, we're working on fixing it, but we'd like to get
>>>> some work into a consumable release for people."
>>>> 
>>>> Seem reasonable?
>>>> 
>>>> On Sat, May 13, 2017 at 8:01 PM, Joan Touzet <wohali@apache.org>
>> wrote:
>>>>> Hi everyone,
>>>>> 
>>>>> I'm +/-0 on this only because there's a little ambiguity in steps 2
>> and 4
>>>>> I'd like to clear up. This email is part test status report and
>>>>> part clarification, so I apologize in advance for the length.
>>>>> 
>>>>> It is absolutely _almost_ time we get 2.1 out the door.
>>>>> 
>>>>> Step 2 is the equivalent of sweeping all our possible problems
>> under
>>>>> the rug. The failing tests aren't necessarily failing because we
>> have
>>>>> a bad test suite. In fact, just last week I found a genuine race
>>>>> condition leading to a broken Couch from one of these test
>> cases[1].
>>>>> I don't want to just sweep everything under the rug to get a
>> release
>>>>> out the door like we did for 2.0.0; if we'd held on for a few more
>> weeks
>>>>> for that release we might have found and fixed that bug (and a few
>>>>> others, too.)
>>>>> 
>>>>> It's worth noting that we can't disable /all/ of the failing tests
>> for
>>>>> a 2.1 release either; at least one of the failures can best be
>> described
>>>>> as "couchjs just sometimes segfaults." So unless we're ready to
>> just
>>>>> disable the entire JS test suite... ;) And for the detractors out
>> there,
>>>>> there are more EUnit than JS failing test cases right now (13 vs.
>> 6)!
>>>>> 
>>>>> Step 4, for me, *must* include re-enabling all of the failing tests
>> as
>>>>> soon as possible (or, alternately, only disabling them on the 2.1.x
>>>>> branch.) A PR I intend to land tomorrow, which has +1s from Paul
>> and
>>>>> Jan[2], will upload couch.log files from Travis and Jenkins when a
>> test
>>>>> fails to a central CouchDB for further analysis. Prior to this,
>>>>> determining the actual failure required getting lucky and having
>> one of
>>>>> the tests fail on your machine. With the exception of the
>> compression
>>>>> daemon tests (which I *just* increased the timeout on just 4 days
>> ago[3])
>>>>> most of these test failures we just need more data. Disabling the
>> tests
>>>>> now that we finally have useful CI telemetry is like launching a
>> fleet of
>>>>> satellites to monitor global climate, then banning the agency
>> responsible
>>>>> for them from monitoring them for vital data. :D
>>>>> 
>>>>> Thanks for reading. Let's move forward on 2.1...carefully.
>>>>> 
>>>>> -Joan
>>>>> 
>>>>> [1]
>> https://github.com/apache/couchdb/commit/81ee7c5ac71e617a03e967b4fc5d0358f4ba9459
>>>>> [2] https://github.com/apache/couchdb/pull/514
>>>>> [3]
>> https://github.com/apache/couchdb/commit/ca4761c6177748f6c87bd072939f7b3eb6fa1edd#diff-41b21ba8ff04bec904f235212d7c4de0
>>>>> 
>>>>> ----- Original Message -----
>>>>> From: "Jan Lehnardt" <jan@apache.org>
>>>>> To: "dev" <dev@couchdb.apache.org>
>>>>> Sent: Thursday, 11 May, 2017 1:41:35 PM
>>>>> Subject: 2.1
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> we should get CouchDB 2.1 out soon and the test suite situation is
>> a somewhat annoying blocker, so I’m proposing something that might
>> sound unusual: disable the failing tests.
>>>>> 
>>>>> All test failures are intermittent and we must absolutely address
>> this, but since nobody picked this up since February, I think we need a
>> new plan.
>>>>> 
>>>>> The one other issue is that the replication manager was merged
>> recently and is still fairly new code, so I’m proposing this:
>>>>> 
>>>>> 1. Fork 2.1.x off of master just before the replication scheduler
>> merge.
>>>>> 
>>>>>   1.1. backport any other fixes in master to 2.1.x that happened
>> after the replication scheduler.
>>>>> 
>>>>> 2. Disable all failing tests.
>>>>> 
>>>>> 3. Start the release procedure.
>>>>> 
>>>>> 4. Fix tests on master for 2.2, which then also can include the
>> replication schedule.
>>>>> 
>>>>> If there are no objections, I’m happy to prepare the 2.1.x branch
>> early next week.
>>>>> 
>>>>> Best
>>>>> Jan
>>>>> --
>>> 
>>> --
>>> Professional Support for Apache CouchDB:
>>> https://neighbourhood.ie/couchdb-support/
>>> 
> 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/


Mime
View raw message