Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 50129 invoked from network); 10 Jan 2011 16:49:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Jan 2011 16:49:03 -0000 Received: (qmail 17436 invoked by uid 500); 10 Jan 2011 16:49:01 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 17162 invoked by uid 500); 10 Jan 2011 16:49:01 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 17137 invoked by uid 99); 10 Jan 2011 16:49:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jan 2011 16:49:00 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.160.180 as permitted sender) Received: from [209.85.160.180] (HELO mail-gy0-f180.google.com) (209.85.160.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jan 2011 16:48:53 +0000 Received: by gya6 with SMTP id 6so8207441gya.11 for ; Mon, 10 Jan 2011 08:48:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=XQjyUri0vENc04Me5ke8lD2s6gOLZ3rKGPUIWBST3zI=; b=pCeziNL82N4DhxxW98ETXTDDE1gTvqWISOfj64MmOpWDRqfSHg2oqKGl5J23NhKP/S +IWxVcUFw7E8AwHofQaxmmSUVLLiN4tQM4B4kFhsPqqHj1Z8S0FtgTbu/SqMKy8rVg7B aSr0TUxMb1isWUJINEqJiQ+g7CPWF8Hqm9gEs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=CeX7mzoB5X5Q8GED4WtiZbX6IjTzNjh6obH4h+o5Y2/AB69qdaXoqBOcZXO6ubzrHa fUl4jPbwjrifVJ2fu+DvdBQQTbHTZyKw7zhVV6Ct5tvHqzBRtFXL/IFCzkl0tjo9hl5Q qRgLi2oTwqco8gUHQ2Kbwk3ikkLUKc0xLzkDY= Received: by 10.151.41.9 with SMTP id t9mr28964034ybj.255.1294678112665; Mon, 10 Jan 2011 08:48:32 -0800 (PST) MIME-Version: 1.0 Received: by 10.147.181.18 with HTTP; Mon, 10 Jan 2011 08:47:52 -0800 (PST) In-Reply-To: <1294676337.4402.33.camel@mike.loop.com.br> References: <1294676337.4402.33.camel@mike.loop.com.br> From: Paul Davis Date: Mon, 10 Jan 2011 11:47:52 -0500 Message-ID: Subject: Re: Having problems with conflict resolution To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Jan 10, 2011 at 11:18 AM, Mike Leddy wrote: > Hello, > > I have a situation where the same document can come from several sources > and I have written a script (in ruby) which effectively merges the > information in all conflicting documents (deletes the originals) and > inserts the new merged document. > > Everything seemed fine until I observed that sometimes the newly inserted > document remained deleted..... > > On further investigation I discovered that I was (by design) merging the > documents in a deterministic way and it was possible that if I was merging > documents A + B + C giving A ie: document A already has all the > information contained in documents A & B & C. > > Since i was deleting A and then subsequently inserting essentially the same > document it remained deleted even though the bulk_docs API was indicating a > successful insertion. > > I am using a recent 1.0.x branch. Here is the essence of what is happening > using the same API calls: > > # create a database > curl -X PUT 127.0.0.1:5984/bulk_docs > {"ok":true} > > # insert a doc 'mike' > curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs' -d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}' > [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}] > > # insert another doc 'john' with same id > curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs' -d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"john"}]}' > [{"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}] > > # 'john' is the winning conflict > curl 'localhost:5984/bulk_docs/same' > {"_id":"same","_rev":"1-ec562a018012e70bbf8da7f6f58970d7","name":"john"} > > # delete 'john' > curl -X DELETE 'localhost:5984/bulk_docs/same?rev=1-ec562a018012e70bbf8da7f6f58970d7' > {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"} > > # delete 'mike' > curl -X DELETE 'localhost:5984/bulk_docs/same?rev=1-d6246810df84e21f7611601d0cceccbf' > {"ok":true,"id":"same","rev":"2-db780681eced993484c7f171ab7f599c"} > > # none left > curl 'localhost:5984/bulk_docs/same' > {"error":"not_found","reason":"deleted"} > > # insert 'mike' again > curl -X POST -H 'Content-type: application/json' 'localhost:5984/bulk_docs/_bulk_docs' -d '{"all_or_nothing":true,"docs":[{"_id":"same", "name":"mike"}]}' > [{"id":"same","rev":"1-d6246810df84e21f7611601d0cceccbf"}] > > # ouch !!!!! > curl 'localhost:5984/bulk_docs/same' > {"error":"not_found","reason":"deleted"} > > Since I have the conflict resolution script working on all nodes I want > the result to be deterministic so as to be sure that all nodes calculate > the same result and produce revisions that are the same....... always > converging on exactly the same result. > > Any insights ? > > Regards, > > Mike > > This has to do with how docs in deleted states can be revived which can lead to unexpected behavior like this. A somewhat simpler curl session: $ curl -X PUT http://127.0.0.1:5984/test {"ok":true} $ curl -X POST -H "Content-Type: application/json" http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true, "docs": [{"_id": "same", "name": "john"}]}' [{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}] $ curl -X DELETE http://127.0.0.1:5984/test/same?rev=1-ec562a018012e70bbf8da7f6f58970d7 {"ok":true,"id":"same","rev":"2-1dae8400f3e20ab34b845e855ba6dc85"} $ curl -X POST -H "Content-Type: application/json" http://127.0.0.1:5984/test/_bulk_docs -d '{"all_or_nothing": true, "docs": [{"_id": "same", "name": "john"}]}' [{"ok":true,"id":"same","rev":"1-ec562a018012e70bbf8da7f6f58970d7"}] $ curl http://127.0.0.1:5984/test/same {"error":"not_found","reason":"deleted"} $ curl -X POST -H "Content-Type: application/json" http://127.0.0.1:5984/test/_bulk_docs -d '{"docs": [{"_id": "same", "name": "john"}]}' [{"ok":true,"id":"same","rev":"3-6bdb1e1130357a1f01080dd74b7c5095"}] $ curl http://127.0.0.1:5984/test/same {"_id":"same","_rev":"3-6bdb1e1130357a1f01080dd74b7c5095","name":"john"} What's happening here is that you're playing with the revision tree weirdly. The progression from your session looks something like such: [note: I'm using 0 (zero) to indicate the null state of document not existing]. 0 0 -> A # put mike 0 -> (A | B) # put john, introducing conflict 0 -> (A | B -> C:deleted) # delete john, no more conflict 0 -> (A -> D:deleted | B -> C:deleted) # delete mike, doc is deleted 0 -> (A -> D:deleted | B -> C:deleted) # tried to reput original mike, but its still deleted. What happens in the last step is that since your 're-put of mike' ended up creating a revision identical to A (because of all_or_nothing: true) the revision trees still have the D:deleted revision which means that the document is still deleted which gives you the behavior you're seeing. If when you re-put the mike version you don't use all_or_nothing: true, then you end up creating a revision tree like such: 0 -> (A -> D:deleted -> E | B -> C:deleted) Which recreates the doc with the new revision. On a side note, the reason that we need to keep deleted revisions is because that's how we determine if a conflict has been resolved during replication. If those revisions disappeared, you'd have to re-resolve conflicts after ever replication.