I believe it's a single delayed conflict write attempt and no successes in that same interval. On Aug 7, 2010, at 7:51 PM, Damien Katz wrote: > Looks like all that's necessary is a single delayed conflict write attempt, and all subsequent delayed commits won't be commit, the header never gets written. > > 1.0 loses data. This is ridiculously bad. > > We need a test to reproduce this and fix. > > -Damien > > On Aug 7, 2010, at 4:35 PM, Adam Kocoloski wrote: > >> Good sleuthing guys, and my apologies for letting this through. Randall, your patch in COUCHDB-794 was actually fine, it was my reworking of it that caused this serious bug. >> >> With respect to that gist 513282, I think it would be better to return Db#db{waiting_delayed_commit=nil} when the headers match instead of moving the cancel_timer() command as you did. After all, we did perform the check here -- it was just that nothing needed to be committed. >> >> Adam >> >> On Aug 7, 2010, at 6:55 PM, Damien Katz wrote: >> >>> Yes, I think it requires 2 conflicting writes in row, because it needs to trigger the delayed_commit timer without actually having anything to commit, so the header never changes. >>> >>> Try to reproduce this and add a test case. >>> >>> -Damien >>> >>> >>> On Aug 7, 2010, at 3:47 PM, Randall Leeds wrote: >>> >>>> I think you may be right, Damien. >>>> If ever a write happens that only contains conflicts while waiting for >>>> a delayed commit message we might still be cancelling the timer. Is >>>> this what you're thinking? This would be the fix: >>>> http://gist.github.com/513282 >>>> >>>> On Sat, Aug 7, 2010 at 15:42, Damien Katz wrote: >>>>> I think the problem might be that 2 conflicting write attempts in row can leave the #db.waiting_delayed_commit set but the timer has been cancelled. One that happens, the header may never be written, as it always thinks a delayed commit will fire soon. >>>>> >>>>> -Damien >>>>> >>>>> >>>>> On Aug 7, 2010, at 12:08 PM, Randall Leeds wrote: >>>>> >>>>>> On Sat, Aug 7, 2010 at 11:56, Randall Leeds wrote: >>>>>>> I agree completely! I immediately thought of this because I wrote that >>>>>>> change. I spent a while staring at it last night but still can't >>>>>>> imagine how it's a problem. >>>>>>> >>>>>>> On Sat, Aug 7, 2010 at 11:12, Damien Katz wrote: >>>>>>>> SVN commit r954043 looks suspicious. Digging further. >>>>>>>> >>>>>>>> -Damien >>>>>>> >>>>>> >>>>>> I still want to stare at r954043, but it looks to me like there's at >>>>>> least one situation where we do not commit data correctly during >>>>>> compaction. This has to do with the way we now use the path to sync >>>>>> outside the couch_file:process. Check this diff: >>>>>> http://gist.github.com/513081 >>>>> >>>>> >>> >> >