POSTing to /db/_ensure_full_commit will still cause a header to be written.
Switching to delayed_commits = false and then writing a document will cause a header to be
written for that DB.
POSTing to /_ensure_full_commit for each DB and then flipping the delayed_commits to false
will put a 1.0.0 server into a safe state with all data saved.
Adam
On Aug 7, 2010, at 8:57 PM, Chris Anderson wrote:
> Will switching a running 1.0 server to delayed_commits=true cause the noncommitted headers
to be written? Are there other remedies for folks with critical data in 1.0 who want to ensure
they are safe?
>
> Chris
>
> Typed on glass.
>
> On Aug 7, 2010, at 5:47 PM, Adam Kocoloski <kocolosk@apache.org> wrote:
>
>> Committed to trunk and 1.0.x.
>>
>> On Aug 7, 2010, at 8:33 PM, Randall Leeds wrote:
>>
>>> http://github.com/tilgovi/couchdb/tree/fixlostcommits
>>>
>>> Test and fix in separate commits at the end of that branch, based off
>>> current trunk.
>>> Would appreciate verification that the test is initially broken but
>>> fixed by the patch.
>>>
>>> On Sat, Aug 7, 2010 at 17:16, Damien Katz <damien@apache.org> wrote:
>>>> I reproduced this manually:
>>>>
>>>> Create document with id "x", ensure full commit (simply wait longer than
1 sec, say 2 secs).
>>>>
>>>> Attempt to create document "x" again, get conflict error.
>>>>
>>>> Wait at least 2 secs to ensure the delayed commit attempt happens.
>>>>
>>>> Now create document "y".
>>>>
>>>> Wait at least 2 secs because the delayed commit should happen
>>>>
>>>> Restart server.
>>>>
>>>> Document "y" is now missing.
>>>>
>>>> The last delayed commit isn't happening. From then on out, no docs updated
with delayed commit with be available after a restart.
>>>>
>>>> -Damien
>>>>
>>>> On Aug 7, 2010, at 4:58 PM, Adam Kocoloski wrote:
>>>>
>>>>> I believe it's a single delayed conflict write attempt and no successes
in that same interval.
>>>>>
>>>>> On Aug 7, 2010, at 7:51 PM, Damien Katz wrote:
>>>>>
>>>>>> Looks like all that's necessary is a single delayed conflict write
attempt, and all subsequent delayed commits won't be commit, the header never gets written.
>>>>>>
>>>>>> 1.0 loses data. This is ridiculously bad.
>>>>>>
>>>>>> We need a test to reproduce this and fix.
>>>>>>
>>>>>> -Damien
>>>>>>
>>>>>> On Aug 7, 2010, at 4:35 PM, Adam Kocoloski wrote:
>>>>>>
>>>>>>> Good sleuthing guys, and my apologies for letting this through.
Randall, your patch in COUCHDB-794 was actually fine, it was my reworking of it that caused
this serious bug.
>>>>>>>
>>>>>>> With respect to that gist 513282, I think it would be better
to return Db#db{waiting_delayed_commit=nil} when the headers match instead of moving the cancel_timer()
command as you did. After all, we did perform the check here -- it was just that nothing
needed to be committed.
>>>>>>>
>>>>>>> Adam
>>>>>>>
>>>>>>> On Aug 7, 2010, at 6:55 PM, Damien Katz wrote:
>>>>>>>
>>>>>>>> Yes, I think it requires 2 conflicting writes in row, because
it needs to trigger the delayed_commit timer without actually having anything to commit, so
the header never changes.
>>>>>>>>
>>>>>>>> Try to reproduce this and add a test case.
>>>>>>>>
>>>>>>>> -Damien
>>>>>>>>
>>>>>>>>
>>>>>>>> On Aug 7, 2010, at 3:47 PM, Randall Leeds wrote:
>>>>>>>>
>>>>>>>>> I think you may be right, Damien.
>>>>>>>>> If ever a write happens that only contains conflicts
while waiting for
>>>>>>>>> a delayed commit message we might still be cancelling
the timer. Is
>>>>>>>>> this what you're thinking? This would be the fix:
>>>>>>>>> http://gist.github.com/513282
>>>>>>>>>
>>>>>>>>> On Sat, Aug 7, 2010 at 15:42, Damien Katz <damien@apache.org>
wrote:
>>>>>>>>>> I think the problem might be that 2 conflicting write
attempts in row can leave the #db.waiting_delayed_commit set but the timer has been cancelled.
One that happens, the header may never be written, as it always thinks a delayed commit will
fire soon.
>>>>>>>>>>
>>>>>>>>>> -Damien
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Aug 7, 2010, at 12:08 PM, Randall Leeds wrote:
>>>>>>>>>>
>>>>>>>>>>> On Sat, Aug 7, 2010 at 11:56, Randall Leeds <randall.leeds@gmail.com>
wrote:
>>>>>>>>>>>> I agree completely! I immediately thought
of this because I wrote that
>>>>>>>>>>>> change. I spent a while staring at it last
night but still can't
>>>>>>>>>>>> imagine how it's a problem.
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Aug 7, 2010 at 11:12, Damien Katz
<damien@apache.org> wrote:
>>>>>>>>>>>>> SVN commit r954043 looks suspicious.
Digging further.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Damien
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I still want to stare at r954043, but it looks
to me like there's at
>>>>>>>>>>> least one situation where we do not commit data
correctly during
>>>>>>>>>>> compaction. This has to do with the way we now
use the path to sync
>>>>>>>>>>> outside the couch_file:process. Check this diff:
>>>>>>>>>>> http://gist.github.com/513081
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>
|