couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: Data loss
Date Sat, 07 Aug 2010 23:35:22 GMT
Good sleuthing guys, and my apologies for letting this through.  Randall, your patch in COUCHDB-794
was actually fine, it was my reworking of it that caused this serious bug.

With respect to that gist 513282, I think it would be better to return Db#db{waiting_delayed_commit=nil}
when the headers match instead of moving the cancel_timer() command as you did.  After all,
we did perform the check here -- it was just that nothing needed to be committed.

Adam

On Aug 7, 2010, at 6:55 PM, Damien Katz wrote:

> Yes, I think it requires 2 conflicting writes in row, because it needs to trigger the
delayed_commit timer without actually having anything to commit, so the header never changes.
> 
> Try to reproduce this and add a test case.
> 
> -Damien
> 
> 
> On Aug 7, 2010, at 3:47 PM, Randall Leeds wrote:
> 
>> I think you may be right, Damien.
>> If ever a write happens that only contains conflicts while waiting for
>> a delayed commit message we might still be cancelling the timer. Is
>> this what you're thinking? This would be the fix:
>> http://gist.github.com/513282
>> 
>> On Sat, Aug 7, 2010 at 15:42, Damien Katz <damien@apache.org> wrote:
>>> I think the problem might be that 2 conflicting write attempts in row can leave
the #db.waiting_delayed_commit set but the timer has been cancelled. One that happens, the
header may never be written, as it always thinks a delayed commit will fire soon.
>>> 
>>> -Damien
>>> 
>>> 
>>> On Aug 7, 2010, at 12:08 PM, Randall Leeds wrote:
>>> 
>>>> On Sat, Aug 7, 2010 at 11:56, Randall Leeds <randall.leeds@gmail.com>
wrote:
>>>>> I agree completely! I immediately thought of this because I wrote that
>>>>> change. I spent a while staring at it last night but still can't
>>>>> imagine how it's a problem.
>>>>> 
>>>>> On Sat, Aug 7, 2010 at 11:12, Damien Katz <damien@apache.org> wrote:
>>>>>> SVN commit r954043 looks suspicious. Digging further.
>>>>>> 
>>>>>> -Damien
>>>>> 
>>>> 
>>>> I still want to stare at r954043, but it looks to me like there's at
>>>> least one situation where we do not commit data correctly during
>>>> compaction. This has to do with the way we now use the path to sync
>>>> outside the couch_file:process. Check this diff:
>>>> http://gist.github.com/513081
>>> 
>>> 
> 


Mime
View raw message