couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Resolving replication conflicts for deleted documents in CouchDB
Date Thu, 25 Oct 2012 12:17:16 GMT
A deletion is just an update. The algorithm that CouchDB uses to
choose one leaf out of many deliberately chooses _deleted:false over
_deleted:true.

Here's a test run I just performed on couchdb/master;

# setup instance #1
curl localhost:5984/alex -XPUT
{"ok":true}

curl localhost:5984/alex/foo -XPUT -d{}
{"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}

# setup identical instance #2
curl localhost:5984/alex2 -XPUT
{"ok":true}

curl localhost:5984/alex2/foo -XPUT -d{}
{"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}

# update doc in instance #1
curl localhost:5984/alex2/foo -XPUT -d
'{"_rev:"1-967a00dff5e02add41819138abb3284d"}'

# delete doc in instance #2
curl localhost:5984/alex2/foo?rev=1-967a00dff5e02add41819138abb3284d  -XDELETE

curl localhost:5984/_replicate -Hcontent-type:application/json -d
'{"source":"alex2","target":"alex"}'
{"ok":true,"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","source_last_seq":2,"replication_id_version":3,"history":[{"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","start_time":"Thu,
25 Oct 2012 12:10:54 GMT","end_time":"Thu, 25 Oct 2012 12:10:54
GMT","start_last_seq":0,"end_last_seq":2,"recorded_seq":2,"missing_checked":1,"missing_found":1,"docs_read":1,"docs_written":1,"doc_write_failures":0}]}

curl localhost:5984/alex/foo
{"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}

curl 'localhost:5984/alex/foo?open_revs=all'
--2b1fcadf47010c46a3afa22b7533dd07
Content-Type: application/json

{"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}
--2b1fcadf47010c46a3afa22b7533dd07
Content-Type: application/json

{"_id":"foo","_rev":"2-eec205a9d413992850a6e32678485900","_deleted":true}
--2b1fcadf47010c46a3afa22b7533dd07--%

As you can see, the first database, alex, will show the non-deleted
doc as per our algorithm, but the doc has two leaf revisions now. To
resolve in the direction you want, delete the
2-7051cbe5c8faecd085a3fa619e6e6337 revision;

curl localhost:5984/alex/foo?rev=2-7051cbe5c8faecd085a3fa619e6e6337 -XDELETE
{"ok":true,"id":"foo","rev":"3-7379b9e515b161226c6559d90c4dc49f"}

curl 'localhost:5984/alex/foo'
{"error":"not_found","reason":"deleted"}

B.

On 25 October 2012 01:29, Alexander Bolodurin
<alexander.bolodurin@gmail.com> wrote:
> Hi,
>
> (I have asked this at StackOverflow, but, unsurprisingly, the question didn't get much
attention.)
>
> I'm designing replication conflict handling for a system, and one of its assumptions
is that deletion always takes precedence when resolving conflicts: a deleted documents stays
deleted regardless of what edits it conflicts with, IDs are not reused.
>
> The "official" way of resolving replication conflicts (read conflicting revisions, merge
in the application code, delete unwanted revisions) is not applicable to deleted documents.
If a document is edited on instance 1, and deleted on instance 2, after replication both instances
get the revision from 1. Because only one leaf revision is alive, the document ends up "undeleted",
and without conflicts. The other revision ends up in _deleted_conflicts field, instead of
_conflicts, but I can't use _deleted_conflicts as a cue that a document was deleted, because
it includes deleted revisions from resolving edit conflicts and documents that were deleted
and then re-added, so it's too general and conflates several cases.
>
> How can I get around this at the CouchDB level? Moving it up the application layer gets
really hairy really quickly as now I have to have my custom "deleted" flag, rewrite my views,
test more code and have extra batch jobs to clean up records marked for delete.
>
> Regards,
> Alex.

Mime
View raw message