couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shorin <kxe...@gmail.com>
Subject Re: [VOTE] Release Apache CouchDB 1.6.0-rc.3
Date Sun, 04 May 2014 16:42:16 GMT
According the comments around the code I was hope so, but suddenly the
issue is looks like OTP-9167 and acts as OTP-9167 - I'm not sure how
to classify it else then OTP-9167. However, I also have a feeling that
some code is missed around this note:
https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator.erl#L299

--
,,,^..^,,,


On Sun, May 4, 2014 at 7:43 PM, Robert Samuel Newson <rnewson@apache.org> wrote:
> Hrm,  OTP-9167 was reported by Filipe, the main author of the current couchdb replicator,
and he also changed how this was handled in couchdb to compensate. This is some ancient stuff,
hard to believe it’s the cause of our latest issue. I must be missing something.
>
> On 4 May 2014, at 12:46, Alexander Shorin <kxepal@gmail.com> wrote:
>
>> On Wed, Apr 30, 2014 at 4:17 PM, Alexander Shorin <kxepal@gmail.com> wrote:
>>> On Tue, Apr 29, 2014 at 5:56 PM, Alexander Shorin <kxepal@gmail.com> wrote:
>>>> On Wed, Apr 23, 2014 at 1:03 PM, Mutton, James <jmutton@akamai.com>
wrote:
>>>>> well, bummer.  Tried 3 times on R14B01, all 3 I get:
>>>>> /tmp/couchdb/dist/apache-couchdb-1.6.0/apache-couchdb-1.6.0/_build/../src/couch_replicator/test/07-use-checkpoints.t
.......... Failed 4/16 subtests
>>>>>
>>>>> Test Summary Report
>>>>> -------------------
>>>>> /tmp/couchdb/dist/apache-couchdb-1.6.0/apache-couchdb-1.6.0/_build/../src/couch_replicator/test/07-use-checkpoints.t
       (Wstat: 0 Tests: 16 Failed: 4)
>>>>>  Failed tests:  9, 12-13, 15
>>>>> Files=7, Tests=1832, 150 wallclock secs ( 0.81 usr  0.09 sys + 155.32
cusr 13.16 csys = 169.38 CPU)
>>>>> Result: FAIL
>>>>> make[3]: *** [check] Error 1
>>>>>
>>>>> Unfortunately, I’m needing some sleep then leaving on some vacation
for the rest of the week.  I’ll see if I can maybe look closer at what’s going on locally
while on the flight.
>>>>
>>>> I'm failed to reproduce this with R14B04, but will try to R14B01 as you have.
>>>
>>> Confirmed for R14B01.
>>
>> Ok, I've found the roots of this issue. It's even named as OTP-9167 as
>> was fixed in R14B03 and because of it 07-use-checkpoints.t fails for
>> R14B01: it couldn't run replicator worker with new child spec where
>> use_checkpoint bit flipped because supervisor hold the initial one, it
>> see that there replication with the same id going to happen and
>> restarts it with the old spec ignoring any changes. I could fix the
>> test, but I couldn't fix the issue in root and not sure that it's
>> worths to search for any workarounds nowdays (R14B03 was released at
>> 2011-05-24, almost 3 years ago).
>>
>> However, here are three solutions that I have:
>>
>> 0. Do nothing.
>> 1. Isolate tests from each other to hide the issue (isolation is good,
>> but hiding bugs is bad):
>> https://www.friendpaste.com/1lnTEFg6RId5PDRAmvbBVO
>> 2. On test failure check Erlang version and note that this failure is
>> *fine* for specific versions:
>> https://www.friendpaste.com/3TmqoNjEF3xnYtbLybSL7G
>> 3. Add "+no_checkpoints" suffix to replication id if "use_checkpoints:
>> false" was specified. Thus, it solves the problem:
>> https://www.friendpaste.com/3TmqoNjEF3xnYtbLybSKpT (yes, some
>> refactoring love is required)
>> But I'm not sure that this is good idea.
>>
>> Personally, I would prefer to keep this "bug" alive as reminder that
>> things for your Erlang version *could* happens wrong. Your thoughts?
>>
>>
>> --
>> ,,,^..^,,,
>

Mime
View raw message