incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Having problems with conflict resolution
Date Mon, 10 Jan 2011 16:47:52 GMT
On Mon, Jan 10, 2011 at 11:18 AM, Mike Leddy <mike@loop.com.br> wrote:
> Hello,
>
> I have a situation where the same document can come from several sources
> and I have written a script (in ruby) which effectively merges the
> information in all conflicting documents (deletes the originals) and
> inserts the new merged document.
>
> Everything seemed fine until I observed that sometimes the newly inserted
> document remained deleted.....
>
> On further investigation I discovered that I was (by design) merging the
> documents in a deterministic way and it was possible that if I was merging
> documents A + B + C giving A ie: document A already has all the
> information contained in documents A & B & C.
>
> Since i was deleting A and then subsequently inserting essentially the same
> document it remained deleted even though the bulk_docs API was indicating a
> successful insertion.
>
> I am using a recent 1.0.x branch. Here is the essence of what is happening
> using the same API calls:
>
> # create a database
> curl -X PUT 127.0.0.1:5984/bulk_docs
> {"ok":true}
>
> # insert a doc 'mike'
> curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs'
-d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}'
> [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}]
>
> # insert another doc 'john' with same id
> curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs'
-d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"john"}]}'
> [{"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}]
>
> # 'john' is the winning conflict
> curl 'localhost:5984/bulk_docs/same'
> {"_id":"same","_rev":"1-ec562a018012e70bbf8da7f6f58970d7","name":"john"}
>
> # delete 'john'
> curl -X DELETE 'localhost:5984/bulk_docs/same?rev=1-ec562a018012e70bbf8da7f6f58970d7'
> {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"}
>
> # delete 'mike'
> curl -X DELETE 'localhost:5984/bulk_docs/same?rev=1-d6246810df84e21f7611601d0cceccbf'
> {"ok":true,"id":"same","rev":"2-db780681eced993484c7f171ab7f599c"}
>
> # none left
> curl 'localhost:5984/bulk_docs/same'
> {"error":"not_found","reason":"deleted"}
>
> # insert 'mike' again
> curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs'
-d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}'
> [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}]
>
> # ouch !!!!!
> curl 'localhost:5984/bulk_docs/same'
> {"error":"not_found","reason":"deleted"}
>
> Since I have the conflict resolution script working on all nodes I want
> the result to be deterministic so as to be sure that all nodes calculate
> the same result and produce revisions that are the same....... always
> converging on exactly the same result.
>
> Any insights ?
>
> Regards,
>
> Mike
>
>

This has to do with how docs in deleted states can be revived which
can lead to unexpected behavior like this.

A somewhat simpler curl session:

$ curl -X PUT http://127.0.0.1:5984/test
{"ok":true}

$ curl -X POST -H "Content-Type: application/json"
http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true,
"docs": [{"_id": "same", "name": "john"}]}'
[{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}]

$ curl -X DELETE
http://127.0.0.1:5984/test/same?rev=1-ec562a018012e70bbf8da7f6f58970d7
{"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"}

$ curl -X POST -H "Content-Type: application/json"
http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true,
"docs": [{"_id": "same", "name": "john"}]}'
[{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}]

$ curl http://127.0.0.1:5984/test/same
{"error":"not_found","reason":"deleted"}

$ curl -X POST -H "Content-Type: application/json"
http://127.0.0.1:5984/test/_bulk_docs -d '{"docs": [{"_id": "same",
"name": "john"}]}'
[{"ok":true,"id":"same","rev":"3-6bdb1e1130357a1f01080dd74b7c5095"}]

$ curl http://127.0.0.1:5984/test/same
{"_id":"same","_rev":"3-6bdb1e1130357a1f01080dd74b7c5095","name":"john"}


What's happening here is that you're playing with the revision tree
weirdly. The progression from your session looks something like such:

[note: I'm using 0 (zero) to indicate the null state of document not existing].

0
0 -> A # put mike
0 -> (A | B) # put john, introducing conflict
0 -> (A | B -> C:deleted) # delete john, no more conflict
0 -> (A -> D:deleted | B -> C:deleted) # delete mike, doc is deleted
0 -> (A -> D:deleted | B -> C:deleted) # tried to reput original mike,
but its still deleted.

What happens in the last step is that since your 're-put of mike'
ended up creating a revision identical to A (because of
all_or_nothing: true) the revision trees still have the D:deleted
revision which means that the document is still deleted which gives
you the behavior you're seeing.

If when you re-put the mike version you don't use all_or_nothing:
true, then you end up creating a revision tree like such:

0 -> (A -> D:deleted -> E | B -> C:deleted)

Which recreates the doc with the new revision.

On a side note, the reason that we need to keep deleted revisions is
because that's how we determine if a conflict has been resolved during
replication. If those revisions disappeared, you'd have to re-resolve
conflicts after ever replication.

Mime
View raw message