Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C484D588 for ; Mon, 29 Oct 2012 02:07:01 +0000 (UTC) Received: (qmail 35516 invoked by uid 500); 29 Oct 2012 02:06:59 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 35451 invoked by uid 500); 29 Oct 2012 02:06:59 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 35441 invoked by uid 99); 29 Oct 2012 02:06:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Oct 2012 02:06:59 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of alexander.bolodurin@gmail.com designates 209.85.220.52 as permitted sender) Received: from [209.85.220.52] (HELO mail-pa0-f52.google.com) (209.85.220.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Oct 2012 02:06:53 +0000 Received: by mail-pa0-f52.google.com with SMTP id hz10so2890180pad.11 for ; Sun, 28 Oct 2012 19:06:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=OcAof2LbppbefUCbAvAsTtcuRoWsvI5T2kRbYrwfWro=; b=I1dyckX4MIJzn2oLLYsdz1XaT8QIW4NhwSC/g4u7bOPH5VWeCdlf1VrIcTC6exSCnE vWpOu/kd6CPNvCzsV26CMh8T+mML3yWp+I7GeXAYtV+BzgE3c6SvbFp6xnYozL+qdkur cTHOMhV36r50od4dPw/XagCCjAxaaqLryr5p3kqIdM4zy61jMqbe842JugDFJmJ77aQf ytyctSnPpdYOlNyAqNW8mn1mVOnEonhDooFHBjpQMKr9wDzVuVtjVDFQcz7oypp3/PF6 d3UMkprf9p3aCkP42ImQv3g2BU4rwaCP+LHQA1PrJzmwQ7ji1rAg8HxrxXay70D5tt+q Anzg== Received: by 10.68.202.198 with SMTP id kk6mr1809911pbc.5.1351476392985; Sun, 28 Oct 2012 19:06:32 -0700 (PDT) Received: from [192.168.1.129] (119-252-71-191.static.highway1.net.au. [119.252.71.191]) by mx.google.com with ESMTPS id mt15sm5117148pbc.49.2012.10.28.19.06.29 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 28 Oct 2012 19:06:31 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1283) Subject: Re: Resolving replication conflicts for deleted documents in CouchDB From: Alexander Bolodurin In-Reply-To: Date: Mon, 29 Oct 2012 13:06:25 +1100 Content-Transfer-Encoding: quoted-printable Message-Id: <6616F937-7D88-44BB-A7BC-4FE1B17B4D02@gmail.com> References: <6CFCAD66-F77F-4F12-8D15-A9124F82CAFF@gmail.com> <8213F582-E95A-483A-993C-06292EDC78E3@gmail.com> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1283) X-Virus-Checked: Checked by ClamAV on apache.org Thanks, This is what I suspected, looks like we have to roll our own "deleted" = state if we want to handle this case. I don't think think the fact that a deleted document may contain = arbitrary attributes help, because then I'd have to examine = _deleted_conflicts list or open_revs just to check if it was deleted. = This means I'll always have to poll any documents that happened to have = any conflicts at all every single time, because _deleted_conflicts will = be forever non-empty (and unbounded) and there is no way to tell which = ones are deleted not due to conflict resolution without reading them. On 26/10/2012, at 1:29 AM, Robert Newson wrote: > Hi, >=20 > Thanks for clarifying. I don't think you can achieve your desired > result at a lower level than your proposal to use your own deleted > flag (and account for that in views, etc). Does it help at all that a > deleted document can contain any set of properties you like? The > DELETE method translates internally to a PUT {_id:id, _rev:new_rev, > _deleted:true}. You can delete a document by adding _deleted:true and > keep any properties you like in there. >=20 > Btw, I stopped populating StackOverflow with answers when they started > abusing their contact database. >=20 > B. >=20 > On 25 October 2012 14:47, Alexander Bolodurin > wrote: >> Thanks Robert, >>=20 >> I understand the mechanics, but it doesn't quite solve my problem = yet. >>=20 >> In your example it's clear: one replica edits foo, another one = deletes foo, so both will see a live and a _deleted revisions. >> But it's not the only case. If I happened to resolve a regular edit = conflict and delete one revision, the result is identical (as it should = be). >> Except in the second case I shouldn't delete the live revision, = because it has been introduced as a result of conflict resolution, the = user hasn't deleted anything. >>=20 >> As far as I can tell, there is no way to tell the "origin" of a = deleted revision, at least this way. >>=20 >> Example: https://gist.github.com/3952603 >>=20 >> On 25/10/2012, at 11:17 PM, Robert Newson wrote: >>=20 >>> A deletion is just an update. The algorithm that CouchDB uses to >>> choose one leaf out of many deliberately chooses _deleted:false over >>> _deleted:true. >>>=20 >>> Here's a test run I just performed on couchdb/master; >>>=20 >>> # setup instance #1 >>> curl localhost:5984/alex -XPUT >>> {"ok":true} >>>=20 >>> curl localhost:5984/alex/foo -XPUT -d{} >>> {"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"} >>>=20 >>> # setup identical instance #2 >>> curl localhost:5984/alex2 -XPUT >>> {"ok":true} >>>=20 >>> curl localhost:5984/alex2/foo -XPUT -d{} >>> {"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"} >>>=20 >>> # update doc in instance #1 >>> curl localhost:5984/alex2/foo -XPUT -d >>> '{"_rev:"1-967a00dff5e02add41819138abb3284d"}' >>>=20 >>> # delete doc in instance #2 >>> curl localhost:5984/alex2/foo?rev=3D1-967a00dff5e02add41819138abb3284d= -XDELETE >>>=20 >>> curl localhost:5984/_replicate -Hcontent-type:application/json -d >>> '{"source":"alex2","target":"alex"}' >>> = {"ok":true,"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","source_last_se= q":2,"replication_id_version":3,"history":[{"session_id":"ed33d539fe675ac2= 2b76c0a7be3fe1bf","start_time":"Thu, >>> 25 Oct 2012 12:10:54 GMT","end_time":"Thu, 25 Oct 2012 12:10:54 >>> = GMT","start_last_seq":0,"end_last_seq":2,"recorded_seq":2,"missing_checked= ":1,"missing_found":1,"docs_read":1,"docs_written":1,"doc_write_failures":= 0}]} >>>=20 >>> curl localhost:5984/alex/foo >>> {"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"} >>>=20 >>> curl 'localhost:5984/alex/foo?open_revs=3Dall' >>> --2b1fcadf47010c46a3afa22b7533dd07 >>> Content-Type: application/json >>>=20 >>> {"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"} >>> --2b1fcadf47010c46a3afa22b7533dd07 >>> Content-Type: application/json >>>=20 >>> = {"_id":"foo","_rev":"2-eec205a9d413992850a6e32678485900","_deleted":true} >>> --2b1fcadf47010c46a3afa22b7533dd07--% >>>=20 >>> As you can see, the first database, alex, will show the non-deleted >>> doc as per our algorithm, but the doc has two leaf revisions now. To >>> resolve in the direction you want, delete the >>> 2-7051cbe5c8faecd085a3fa619e6e6337 revision; >>>=20 >>> curl localhost:5984/alex/foo?rev=3D2-7051cbe5c8faecd085a3fa619e6e6337 = -XDELETE >>> {"ok":true,"id":"foo","rev":"3-7379b9e515b161226c6559d90c4dc49f"} >>>=20 >>> curl 'localhost:5984/alex/foo' >>> {"error":"not_found","reason":"deleted"} >>>=20 >>> B. >>>=20 >>> On 25 October 2012 01:29, Alexander Bolodurin >>> wrote: >>>> Hi, >>>>=20 >>>> (I have asked this at StackOverflow, but, unsurprisingly, the = question didn't get much attention.) >>>>=20 >>>> I'm designing replication conflict handling for a system, and one = of its assumptions is that deletion always takes precedence when = resolving conflicts: a deleted documents stays deleted regardless of = what edits it conflicts with, IDs are not reused. >>>>=20 >>>> The "official" way of resolving replication conflicts (read = conflicting revisions, merge in the application code, delete unwanted = revisions) is not applicable to deleted documents. If a document is = edited on instance 1, and deleted on instance 2, after replication both = instances get the revision from 1. Because only one leaf revision is = alive, the document ends up "undeleted", and without conflicts. The = other revision ends up in _deleted_conflicts field, instead of = _conflicts, but I can't use _deleted_conflicts as a cue that a document = was deleted, because it includes deleted revisions from resolving edit = conflicts and documents that were deleted and then re-added, so it's too = general and conflates several cases. >>>>=20 >>>> How can I get around this at the CouchDB level? Moving it up the = application layer gets really hairy really quickly as now I have to have = my custom "deleted" flag, rewrite my views, test more code and have = extra batch jobs to clean up records marked for delete. >>>>=20 >>>> Regards, >>>> Alex. >>>=20 >>=20 >=20