incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Leddy <m...@loop.com.br>
Subject Re: Having problems with conflict resolution
Date Mon, 10 Jan 2011 17:52:25 GMT
Thanks for the explanation. The reason I was using 'all_or_nothing' is
because earlier versions of the script tried to do all the work in a 
single '_bulk_docs' call. Now I am starting to realize why that did
not work fro me.....

I now understand why the deleted revision must be kept for replication.
I guess what I was trying to do was wrong. I cannot simply delete all
the revisions and insert potentially the same document again.

I will go back to the original idea of doing everything in a single
atomic '_bulk_docs' operation but I will have to handle the special
case of the document not changing by simply leaving it alone and
just delete its conflicts.

Thanks,

Mike

On Mon, 2011-01-10 at 11:47 -0500, Paul Davis wrote:
> On Mon, Jan 10, 2011 at 11:18 AM, Mike Leddy <mike@loop.com.br> wrote:
> > Hello,
> >
> > I have a situation where the same document can come from several sources
> > and I have written a script (in ruby) which effectively merges the
> > information in all conflicting documents (deletes the originals) and
> > inserts the new merged document.
> >
> > Everything seemed fine until I observed that sometimes the newly inserted
> > document remained deleted.....
> >
> > On further investigation I discovered that I was (by design) merging the
> > documents in a deterministic way and it was possible that if I was merging
> > documents A + B + C giving A ie: document A already has all the
> > information contained in documents A & B & C.
> >
> > Since i was deleting A and then subsequently inserting essentially the same
> > document it remained deleted even though the bulk_docs API was indicating a
> > successful insertion.
> >
> > I am using a recent 1.0.x branch. Here is the essence of what is happening
> > using the same API calls:
> >
> > # create a database
> > curl -X PUT 127.0.0.1:5984/bulk_docs
> > {"ok":true}
> >
> > # insert a doc 'mike'
> > curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs'
-d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}'
> > [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}]
> >
> > # insert another doc 'john' with same id
> > curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs'
-d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"john"}]}'
> > [{"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}]
> >
> > # 'john' is the winning conflict
> > curl 'localhost:5984/bulk_docs/same'
> > {"_id":"same","_rev":"1-ec562a018012e70bbf8da7f6f58970d7","name":"john"}
> >
> > # delete 'john'
> > curl -X DELETE 'localhost:5984/bulk_docs/same?rev=1-ec562a018012e70bbf8da7f6f58970d7'
> > {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"}
> >
> > # delete 'mike'
> > curl -X DELETE 'localhost:5984/bulk_docs/same?rev=1-d6246810df84e21f7611601d0cceccbf'
> > {"ok":true,"id":"same","rev":"2-db780681eced993484c7f171ab7f599c"}
> >
> > # none left
> > curl 'localhost:5984/bulk_docs/same'
> > {"error":"not_found","reason":"deleted"}
> >
> > # insert 'mike' again
> > curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs'
-d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}'
> > [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}]
> >
> > # ouch !!!!!
> > curl 'localhost:5984/bulk_docs/same'
> > {"error":"not_found","reason":"deleted"}
> >
> > Since I have the conflict resolution script working on all nodes I want
> > the result to be deterministic so as to be sure that all nodes calculate
> > the same result and produce revisions that are the same....... always
> > converging on exactly the same result.
> >
> > Any insights ?
> >
> > Regards,
> >
> > Mike
> >
> >
> 
> This has to do with how docs in deleted states can be revived which
> can lead to unexpected behavior like this.
> 
> A somewhat simpler curl session:
> 
> $ curl -X PUT http://127.0.0.1:5984/test
> {"ok":true}
> 
> $ curl -X POST -H "Content-Type: application/json"
> http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true,
> "docs": [{"_id": "same", "name": "john"}]}'
> [{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}]
> 
> $ curl -X DELETE
> http://127.0.0.1:5984/test/same?rev=1-ec562a018012e70bbf8da7f6f58970d7
> {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"}
> 
> $ curl -X POST -H "Content-Type: application/json"
> http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true,
> "docs": [{"_id": "same", "name": "john"}]}'
> [{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}]
> 
> $ curl http://127.0.0.1:5984/test/same
> {"error":"not_found","reason":"deleted"}
> 
> $ curl -X POST -H "Content-Type: application/json"
> http://127.0.0.1:5984/test/_bulk_docs -d '{"docs": [{"_id": "same",
> "name": "john"}]}'
> [{"ok":true,"id":"same","rev":"3-6bdb1e1130357a1f01080dd74b7c5095"}]
> 
> $ curl http://127.0.0.1:5984/test/same
> {"_id":"same","_rev":"3-6bdb1e1130357a1f01080dd74b7c5095","name":"john"}
> 
> 
> What's happening here is that you're playing with the revision tree
> weirdly. The progression from your session looks something like such:
> 
> [note: I'm using 0 (zero) to indicate the null state of document not existing].
> 
> 0
> 0 -> A # put mike
> 0 -> (A | B) # put john, introducing conflict
> 0 -> (A | B -> C:deleted) # delete john, no more conflict
> 0 -> (A -> D:deleted | B -> C:deleted) # delete mike, doc is deleted
> 0 -> (A -> D:deleted | B -> C:deleted) # tried to reput original mike,
> but its still deleted.
> 
> What happens in the last step is that since your 're-put of mike'
> ended up creating a revision identical to A (because of
> all_or_nothing: true) the revision trees still have the D:deleted
> revision which means that the document is still deleted which gives
> you the behavior you're seeing.
> 
> If when you re-put the mike version you don't use all_or_nothing:
> true, then you end up creating a revision tree like such:
> 
> 0 -> (A -> D:deleted -> E | B -> C:deleted)
> 
> Which recreates the doc with the new revision.
> 
> On a side note, the reason that we need to keep deleted revisions is
> because that's how we determine if a conflict has been resolved during
> replication. If those revisions disappeared, you'd have to re-resolve
> conflicts after ever replication.
> 



Mime
View raw message