incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <robert.new...@gmail.com>
Subject Re: Having problems with conflict resolution
Date Mon, 10 Jan 2011 18:15:10 GMT
" single atomic '_bulk_docs' operation"

FYI: _bulk_docs is not atomic.

B.

On Mon, Jan 10, 2011 at 5:52 PM, Mike Leddy <mike@loop.com.br> wrote:
> Thanks for the explanation. The reason I was using 'all_or_nothing' is
> because earlier versions of the script tried to do all the work in a
> single '_bulk_docs' call. Now I am starting to realize why that did
> not work fro me.....
>
> I now understand why the deleted revision must be kept for replication.
> I guess what I was trying to do was wrong. I cannot simply delete all
> the revisions and insert potentially the same document again.
>
> I will go back to the original idea of doing everything in a single
> atomic '_bulk_docs' operation but I will have to handle the special
> case of the document not changing by simply leaving it alone and
> just delete its conflicts.
>
> Thanks,
>
> Mike
>
> On Mon, 2011-01-10 at 11:47 -0500, Paul Davis wrote:
>> On Mon, Jan 10, 2011 at 11:18 AM, Mike Leddy <mike@loop.com.br> wrote:
>> > Hello,
>> >
>> > I have a situation where the same document can come from several sources
>> > and I have written a script (in ruby) which effectively merges the
>> > information in all conflicting documents (deletes the originals) and
>> > inserts the new merged document.
>> >
>> > Everything seemed fine until I observed that sometimes the newly inserted
>> > document remained deleted.....
>> >
>> > On further investigation I discovered that I was (by design) merging the
>> > documents in a deterministic way and it was possible that if I was merging
>> > documents A + B + C giving A ie: document A already has all the
>> > information contained in documents A & B & C.
>> >
>> > Since i was deleting A and then subsequently inserting essentially the same
>> > document it remained deleted even though the bulk_docs API was indicating a
>> > successful insertion.
>> >
>> > I am using a recent 1.0.x branch. Here is the essence of what is happening
>> > using the same API calls:
>> >
>> > # create a database
>> > curl -X PUT 127.0.0.1:5984/bulk_docs
>> > {"ok":true}
>> >
>> > # insert a doc 'mike'
>> > curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs'
-d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}'
>> > [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}]
>> >
>> > # insert another doc 'john' with same id
>> > curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs'
-d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"john"}]}'
>> > [{"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}]
>> >
>> > # 'john' is the winning conflict
>> > curl 'localhost:5984/bulk_docs/same'
>> > {"_id":"same","_rev":"1-ec562a018012e70bbf8da7f6f58970d7","name":"john"}
>> >
>> > # delete 'john'
>> > curl -X DELETE 'localhost:5984/bulk_docs/same?rev=1-ec562a018012e70bbf8da7f6f58970d7'
>> > {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"}
>> >
>> > # delete 'mike'
>> > curl -X DELETE 'localhost:5984/bulk_docs/same?rev=1-d6246810df84e21f7611601d0cceccbf'
>> > {"ok":true,"id":"same","rev":"2-db780681eced993484c7f171ab7f599c"}
>> >
>> > # none left
>> > curl 'localhost:5984/bulk_docs/same'
>> > {"error":"not_found","reason":"deleted"}
>> >
>> > # insert 'mike' again
>> > curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs'
-d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}'
>> > [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}]
>> >
>> > # ouch !!!!!
>> > curl 'localhost:5984/bulk_docs/same'
>> > {"error":"not_found","reason":"deleted"}
>> >
>> > Since I have the conflict resolution script working on all nodes I want
>> > the result to be deterministic so as to be sure that all nodes calculate
>> > the same result and produce revisions that are the same....... always
>> > converging on exactly the same result.
>> >
>> > Any insights ?
>> >
>> > Regards,
>> >
>> > Mike
>> >
>> >
>>
>> This has to do with how docs in deleted states can be revived which
>> can lead to unexpected behavior like this.
>>
>> A somewhat simpler curl session:
>>
>> $ curl -X PUT http://127.0.0.1:5984/test
>> {"ok":true}
>>
>> $ curl -X POST -H "Content-Type: application/json"
>> http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true,
>> "docs": [{"_id": "same", "name": "john"}]}'
>> [{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}]
>>
>> $ curl -X DELETE
>> http://127.0.0.1:5984/test/same?rev=1-ec562a018012e70bbf8da7f6f58970d7
>> {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"}
>>
>> $ curl -X POST -H "Content-Type: application/json"
>> http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true,
>> "docs": [{"_id": "same", "name": "john"}]}'
>> [{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}]
>>
>> $ curl http://127.0.0.1:5984/test/same
>> {"error":"not_found","reason":"deleted"}
>>
>> $ curl -X POST -H "Content-Type: application/json"
>> http://127.0.0.1:5984/test/_bulk_docs -d '{"docs": [{"_id": "same",
>> "name": "john"}]}'
>> [{"ok":true,"id":"same","rev":"3-6bdb1e1130357a1f01080dd74b7c5095"}]
>>
>> $ curl http://127.0.0.1:5984/test/same
>> {"_id":"same","_rev":"3-6bdb1e1130357a1f01080dd74b7c5095","name":"john"}
>>
>>
>> What's happening here is that you're playing with the revision tree
>> weirdly. The progression from your session looks something like such:
>>
>> [note: I'm using 0 (zero) to indicate the null state of document not existing].
>>
>> 0
>> 0 -> A # put mike
>> 0 -> (A | B) # put john, introducing conflict
>> 0 -> (A | B -> C:deleted) # delete john, no more conflict
>> 0 -> (A -> D:deleted | B -> C:deleted) # delete mike, doc is deleted
>> 0 -> (A -> D:deleted | B -> C:deleted) # tried to reput original mike,
>> but its still deleted.
>>
>> What happens in the last step is that since your 're-put of mike'
>> ended up creating a revision identical to A (because of
>> all_or_nothing: true) the revision trees still have the D:deleted
>> revision which means that the document is still deleted which gives
>> you the behavior you're seeing.
>>
>> If when you re-put the mike version you don't use all_or_nothing:
>> true, then you end up creating a revision tree like such:
>>
>> 0 -> (A -> D:deleted -> E | B -> C:deleted)
>>
>> Which recreates the doc with the new revision.
>>
>> On a side note, the reason that we need to keep deleted revisions is
>> because that's how we determine if a conflict has been resolved during
>> replication. If those revisions disappeared, you'd have to re-resolve
>> conflicts after ever replication.
>>
>
>
>

Mime
View raw message