couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hall <fli...@flimzy.com>
Subject Re: 2.1
Date Sun, 21 May 2017 13:05:25 GMT
I tend to agree with the sentiment expressed here. If failing tests are disabled, I have to
wonder about the value of the tests in the first place. 


On May 21, 2017 4:17:47 AM GMT+02:00, "Eli Stevens (Gmail)" <wickedgrey@gmail.com> wrote:
>Please take this as a single data point from an end user:
>
>This approach will probably result in my company declining to upgrade
>to CouchDB 2.1, and instead waiting for 2.2 in hopes that the test
>suite will be in a more stable state by then. This is somewhat ironic,
>given that my company is also sponsoring the work to have Ubuntu
>packages produced automatically for builds that have a passing test
>suite.
>
>I realize that this might be somewhat nonsensical, given that I don't
>have a good handle on what the testing situation was surrounding 2.0,
>but our expectation is that going from 1.6.1 to 2.0 should be a
>support and stability improvement, which is important for us. Right
>now the biggest source of test failures for my company's product is
>CouchDB 1.6.1 instability. We've build up a layer of retry/backoff
>functionality over our couch library, but it still leaks out
>sometimes.
>
>So we're cautiously optimistic about transitioning to 2.0, but I'm
>really unenthusiastic about a release process that treats intermittent
>errors as the responsibility of the end user to mitigate. I'd much
>rather see master made stable (why did the replication scheduler land
>if it wasn't release-ready?) and that ship as 2.1, even if it takes
>longer to get there.
>
>Again, just input from an end user.
>
>Thanks for considering,
>Eli
>
>
>
>On Sun, May 14, 2017 at 12:21 AM, Jan Lehnardt <jan@apache.org> wrote:
>>> *just* for the 2.1 branch
>>
>> Absolutely, just for that branch, master will keep all failing tests
>until
>> we sort them out proper.
>>
>> Thanks Paul for elaborating here, that’s precisely my thinking as
>well.
>>
>> Joan, thanks for highlighting that “just disabling all failing tests”
>won’t
>> do (e.g. in case of couchjs sometimes crashing), we’ll continue to
>have to
>> live with that until we find out what’s wrong.
>>
>> I was mainly thinking about the randomly failing compaction daemon
>type
>> tests.
>>
>> Best
>> Jan
>> --
>>
>>
>>> On 14. May 2017, at 05:46, Paul Davis <paul.joseph.davis@gmail.com>
>wrote:
>>>
>>> Joan,
>>>
>>> Reading this while on ops but my understanding was that the
>disabling
>>> was *just* for the 2.1 branch. Other than that I agree 100%. Other
>>> than wondering why you haven't merged the log upload :P Thats aweome
>>> and I agree will help significantly. And I agree that the tests
>aren't
>>> necessarily bad its just that with a distributed/async system the
>>> whole "works on my machine" turns into a "works on all developer
>>> machines" but then also "blows up on way under powered VMs" which
>>> means our tests have some fun timing issues.
>>>
>>> Given that the tests are randomly failing vs a test or two that's
>>> always failing I'm not that concerned with just flagging the issue
>as
>>> "We're aware of it, we're working on fixing it, but we'd like to get
>>> some work into a consumable release for people."
>>>
>>> Seem reasonable?
>>>
>>> On Sat, May 13, 2017 at 8:01 PM, Joan Touzet <wohali@apache.org>
>wrote:
>>>> Hi everyone,
>>>>
>>>> I'm +/-0 on this only because there's a little ambiguity in steps 2
>and 4
>>>> I'd like to clear up. This email is part test status report and
>>>> part clarification, so I apologize in advance for the length.
>>>>
>>>> It is absolutely _almost_ time we get 2.1 out the door.
>>>>
>>>> Step 2 is the equivalent of sweeping all our possible problems
>under
>>>> the rug. The failing tests aren't necessarily failing because we
>have
>>>> a bad test suite. In fact, just last week I found a genuine race
>>>> condition leading to a broken Couch from one of these test
>cases[1].
>>>> I don't want to just sweep everything under the rug to get a
>release
>>>> out the door like we did for 2.0.0; if we'd held on for a few more
>weeks
>>>> for that release we might have found and fixed that bug (and a few
>>>> others, too.)
>>>>
>>>> It's worth noting that we can't disable /all/ of the failing tests
>for
>>>> a 2.1 release either; at least one of the failures can best be
>described
>>>> as "couchjs just sometimes segfaults." So unless we're ready to
>just
>>>> disable the entire JS test suite... ;) And for the detractors out
>there,
>>>> there are more EUnit than JS failing test cases right now (13 vs.
>6)!
>>>>
>>>> Step 4, for me, *must* include re-enabling all of the failing tests
>as
>>>> soon as possible (or, alternately, only disabling them on the 2.1.x
>>>> branch.) A PR I intend to land tomorrow, which has +1s from Paul
>and
>>>> Jan[2], will upload couch.log files from Travis and Jenkins when a
>test
>>>> fails to a central CouchDB for further analysis. Prior to this,
>>>> determining the actual failure required getting lucky and having
>one of
>>>> the tests fail on your machine. With the exception of the
>compression
>>>> daemon tests (which I *just* increased the timeout on just 4 days
>ago[3])
>>>> most of these test failures we just need more data. Disabling the
>tests
>>>> now that we finally have useful CI telemetry is like launching a
>fleet of
>>>> satellites to monitor global climate, then banning the agency
>responsible
>>>> for them from monitoring them for vital data. :D
>>>>
>>>> Thanks for reading. Let's move forward on 2.1...carefully.
>>>>
>>>> -Joan
>>>>
>>>> [1]
>https://github.com/apache/couchdb/commit/81ee7c5ac71e617a03e967b4fc5d0358f4ba9459
>>>> [2] https://github.com/apache/couchdb/pull/514
>>>> [3]
>https://github.com/apache/couchdb/commit/ca4761c6177748f6c87bd072939f7b3eb6fa1edd#diff-41b21ba8ff04bec904f235212d7c4de0
>>>>
>>>> ----- Original Message -----
>>>> From: "Jan Lehnardt" <jan@apache.org>
>>>> To: "dev" <dev@couchdb.apache.org>
>>>> Sent: Thursday, 11 May, 2017 1:41:35 PM
>>>> Subject: 2.1
>>>>
>>>> Hi all,
>>>>
>>>> we should get CouchDB 2.1 out soon and the test suite situation is
>a somewhat annoying blocker, so I’m proposing something that might
>sound unusual: disable the failing tests.
>>>>
>>>> All test failures are intermittent and we must absolutely address
>this, but since nobody picked this up since February, I think we need a
>new plan.
>>>>
>>>> The one other issue is that the replication manager was merged
>recently and is still fairly new code, so I’m proposing this:
>>>>
>>>> 1. Fork 2.1.x off of master just before the replication scheduler
>merge.
>>>>
>>>>    1.1. backport any other fixes in master to 2.1.x that happened
>after the replication scheduler.
>>>>
>>>> 2. Disable all failing tests.
>>>>
>>>> 3. Start the release procedure.
>>>>
>>>> 4. Fix tests on master for 2.2, which then also can include the
>replication schedule.
>>>>
>>>> If there are no objections, I’m happy to prepare the 2.1.x branch
>early next week.
>>>>
>>>> Best
>>>> Jan
>>>> --
>>
>> --
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
>>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message