Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 96450 invoked from network); 10 Jan 2011 17:52:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Jan 2011 17:52:58 -0000 Received: (qmail 46632 invoked by uid 500); 10 Jan 2011 17:52:57 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 46516 invoked by uid 500); 10 Jan 2011 17:52:56 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 46508 invoked by uid 99); 10 Jan 2011 17:52:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jan 2011 17:52:55 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=NORMAL_HTTP_TO_IP,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [200.243.80.130] (HELO mail.loop.com.br) (200.243.80.130) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 10 Jan 2011 17:52:50 +0000 Received: (qmail 9802 invoked by uid 64014); 10 Jan 2011 14:52:25 -0300 Received: from 172.17.2.106 (mike@loop.com.br@172.17.2.106) by intranet (envelope-from , uid 64011) with qmail-scanner-2.01st (clamdscan: 0.88/1633. spamassassin: 3.0.3. perlscan: 2.01st. Clear:RC:1(172.17.2.106):. Processed in 0.031993 secs); 10 Jan 2011 17:52:25 -0000 Received: from unknown (HELO ?172.17.2.106?) (mike@loop.com.br@172.17.2.106) by 172.17.3.17 with SMTP; 10 Jan 2011 14:52:25 -0300 Subject: Re: Having problems with conflict resolution From: Mike Leddy To: user@couchdb.apache.org Cc: Paul Davis In-Reply-To: References: <1294676337.4402.33.camel@mike.loop.com.br> Content-Type: text/plain; charset="UTF-8" Date: Mon, 10 Jan 2011 14:52:25 -0300 Message-ID: <1294681945.4402.52.camel@mike.loop.com.br> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Content-Transfer-Encoding: 7bit Thanks for the explanation. The reason I was using 'all_or_nothing' is because earlier versions of the script tried to do all the work in a single '_bulk_docs' call. Now I am starting to realize why that did not work fro me..... I now understand why the deleted revision must be kept for replication. I guess what I was trying to do was wrong. I cannot simply delete all the revisions and insert potentially the same document again. I will go back to the original idea of doing everything in a single atomic '_bulk_docs' operation but I will have to handle the special case of the document not changing by simply leaving it alone and just delete its conflicts. Thanks, Mike On Mon, 2011-01-10 at 11:47 -0500, Paul Davis wrote: > On Mon, Jan 10, 2011 at 11:18 AM, Mike Leddy wrote: > > Hello, > > > > I have a situation where the same document can come from several sources > > and I have written a script (in ruby) which effectively merges the > > information in all conflicting documents (deletes the originals) and > > inserts the new merged document. > > > > Everything seemed fine until I observed that sometimes the newly inserted > > document remained deleted..... > > > > On further investigation I discovered that I was (by design) merging the > > documents in a deterministic way and it was possible that if I was merging > > documents A + B + C giving A ie: document A already has all the > > information contained in documents A & B & C. > > > > Since i was deleting A and then subsequently inserting essentially the same > > document it remained deleted even though the bulk_docs API was indicating a > > successful insertion. > > > > I am using a recent 1.0.x branch. Here is the essence of what is happening > > using the same API calls: > > > > # create a database > > curl -X PUT 127.0.0.1:5984/bulk_docs > > {"ok":true} > > > > # insert a doc 'mike' > > curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs' -d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}' > > [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}] > > > > # insert another doc 'john' with same id > > curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs' -d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"john"}]}' > > [{"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}] > > > > # 'john' is the winning conflict > > curl 'localhost:5984/bulk_docs/same' > > {"_id":"same","_rev":"1-ec562a018012e70bbf8da7f6f58970d7","name":"john"} > > > > # delete 'john' > > curl -X DELETE 'localhost:5984/bulk_docs/same?rev=1-ec562a018012e70bbf8da7f6f58970d7' > > {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"} > > > > # delete 'mike' > > curl -X DELETE 'localhost:5984/bulk_docs/same?rev=1-d6246810df84e21f7611601d0cceccbf' > > {"ok":true,"id":"same","rev":"2-db780681eced993484c7f171ab7f599c"} > > > > # none left > > curl 'localhost:5984/bulk_docs/same' > > {"error":"not_found","reason":"deleted"} > > > > # insert 'mike' again > > curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs' -d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}' > > [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}] > > > > # ouch !!!!! > > curl 'localhost:5984/bulk_docs/same' > > {"error":"not_found","reason":"deleted"} > > > > Since I have the conflict resolution script working on all nodes I want > > the result to be deterministic so as to be sure that all nodes calculate > > the same result and produce revisions that are the same....... always > > converging on exactly the same result. > > > > Any insights ? > > > > Regards, > > > > Mike > > > > > > This has to do with how docs in deleted states can be revived which > can lead to unexpected behavior like this. > > A somewhat simpler curl session: > > $ curl -X PUT http://127.0.0.1:5984/test > {"ok":true} > > $ curl -X POST -H "Content-Type: application/json" > http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true, > "docs": [{"_id": "same", "name": "john"}]}' > [{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}] > > $ curl -X DELETE > http://127.0.0.1:5984/test/same?rev=1-ec562a018012e70bbf8da7f6f58970d7 > {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"} > > $ curl -X POST -H "Content-Type: application/json" > http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true, > "docs": [{"_id": "same", "name": "john"}]}' > [{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}] > > $ curl http://127.0.0.1:5984/test/same > {"error":"not_found","reason":"deleted"} > > $ curl -X POST -H "Content-Type: application/json" > http://127.0.0.1:5984/test/_bulk_docs -d '{"docs": [{"_id": "same", > "name": "john"}]}' > [{"ok":true,"id":"same","rev":"3-6bdb1e1130357a1f01080dd74b7c5095"}] > > $ curl http://127.0.0.1:5984/test/same > {"_id":"same","_rev":"3-6bdb1e1130357a1f01080dd74b7c5095","name":"john"} > > > What's happening here is that you're playing with the revision tree > weirdly. The progression from your session looks something like such: > > [note: I'm using 0 (zero) to indicate the null state of document not existing]. > > 0 > 0 -> A # put mike > 0 -> (A | B) # put john, introducing conflict > 0 -> (A | B -> C:deleted) # delete john, no more conflict > 0 -> (A -> D:deleted | B -> C:deleted) # delete mike, doc is deleted > 0 -> (A -> D:deleted | B -> C:deleted) # tried to reput original mike, > but its still deleted. > > What happens in the last step is that since your 're-put of mike' > ended up creating a revision identical to A (because of > all_or_nothing: true) the revision trees still have the D:deleted > revision which means that the document is still deleted which gives > you the behavior you're seeing. > > If when you re-put the mike version you don't use all_or_nothing: > true, then you end up creating a revision tree like such: > > 0 -> (A -> D:deleted -> E | B -> C:deleted) > > Which recreates the doc with the new revision. > > On a side note, the reason that we need to keep deleted revisions is > because that's how we determine if a conflict has been resolved during > replication. If those revisions disappeared, you'd have to re-resolve > conflicts after ever replication. >