incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Bolodurin <alexander.bolodu...@gmail.com>
Subject Re: Resolving replication conflicts for deleted documents in CouchDB
Date Mon, 29 Oct 2012 02:06:25 GMT
Thanks,

This is what I suspected, looks like we have to roll our own "deleted" state if we want to
handle this case.

I don't think think the fact that a deleted document may contain arbitrary attributes help,
because then I'd have to examine _deleted_conflicts list or open_revs just to check if it
was deleted. This means I'll always have to poll any documents that happened to have any conflicts
at all every single time, because _deleted_conflicts will be forever non-empty (and unbounded)
and there is no way to tell which ones are deleted not due to conflict resolution without
reading them.

On 26/10/2012, at 1:29 AM, Robert Newson wrote:

> Hi,
> 
> Thanks for clarifying. I don't think you can achieve your desired
> result at a lower level than your proposal to use your own deleted
> flag (and account for that in views, etc). Does it help at all that a
> deleted document can contain any set of properties you like? The
> DELETE method translates internally to a PUT {_id:id, _rev:new_rev,
> _deleted:true}. You can delete a document by adding _deleted:true and
> keep any properties you like in there.
> 
> Btw, I stopped populating StackOverflow with answers when they started
> abusing their contact database.
> 
> B.
> 
> On 25 October 2012 14:47, Alexander Bolodurin
> <alexander.bolodurin@gmail.com> wrote:
>> Thanks Robert,
>> 
>> I understand the mechanics, but it doesn't quite solve my problem yet.
>> 
>> In your example it's clear: one replica edits foo, another one deletes foo, so both
will see a live and a _deleted revisions.
>> But it's not the only case. If I happened to resolve a regular edit conflict and
delete one revision, the result is identical (as it should be).
>> Except in the second case I shouldn't delete the live revision, because it has been
introduced as a result of conflict resolution, the user hasn't deleted anything.
>> 
>> As far as I can tell, there is no way to tell the "origin" of a deleted revision,
at least this way.
>> 
>> Example: https://gist.github.com/3952603
>> 
>> On 25/10/2012, at 11:17 PM, Robert Newson wrote:
>> 
>>> A deletion is just an update. The algorithm that CouchDB uses to
>>> choose one leaf out of many deliberately chooses _deleted:false over
>>> _deleted:true.
>>> 
>>> Here's a test run I just performed on couchdb/master;
>>> 
>>> # setup instance #1
>>> curl localhost:5984/alex -XPUT
>>> {"ok":true}
>>> 
>>> curl localhost:5984/alex/foo -XPUT -d{}
>>> {"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}
>>> 
>>> # setup identical instance #2
>>> curl localhost:5984/alex2 -XPUT
>>> {"ok":true}
>>> 
>>> curl localhost:5984/alex2/foo -XPUT -d{}
>>> {"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}
>>> 
>>> # update doc in instance #1
>>> curl localhost:5984/alex2/foo -XPUT -d
>>> '{"_rev:"1-967a00dff5e02add41819138abb3284d"}'
>>> 
>>> # delete doc in instance #2
>>> curl localhost:5984/alex2/foo?rev=1-967a00dff5e02add41819138abb3284d  -XDELETE
>>> 
>>> curl localhost:5984/_replicate -Hcontent-type:application/json -d
>>> '{"source":"alex2","target":"alex"}'
>>> {"ok":true,"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","source_last_seq":2,"replication_id_version":3,"history":[{"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","start_time":"Thu,
>>> 25 Oct 2012 12:10:54 GMT","end_time":"Thu, 25 Oct 2012 12:10:54
>>> GMT","start_last_seq":0,"end_last_seq":2,"recorded_seq":2,"missing_checked":1,"missing_found":1,"docs_read":1,"docs_written":1,"doc_write_failures":0}]}
>>> 
>>> curl localhost:5984/alex/foo
>>> {"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}
>>> 
>>> curl 'localhost:5984/alex/foo?open_revs=all'
>>> --2b1fcadf47010c46a3afa22b7533dd07
>>> Content-Type: application/json
>>> 
>>> {"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}
>>> --2b1fcadf47010c46a3afa22b7533dd07
>>> Content-Type: application/json
>>> 
>>> {"_id":"foo","_rev":"2-eec205a9d413992850a6e32678485900","_deleted":true}
>>> --2b1fcadf47010c46a3afa22b7533dd07--%
>>> 
>>> As you can see, the first database, alex, will show the non-deleted
>>> doc as per our algorithm, but the doc has two leaf revisions now. To
>>> resolve in the direction you want, delete the
>>> 2-7051cbe5c8faecd085a3fa619e6e6337 revision;
>>> 
>>> curl localhost:5984/alex/foo?rev=2-7051cbe5c8faecd085a3fa619e6e6337 -XDELETE
>>> {"ok":true,"id":"foo","rev":"3-7379b9e515b161226c6559d90c4dc49f"}
>>> 
>>> curl 'localhost:5984/alex/foo'
>>> {"error":"not_found","reason":"deleted"}
>>> 
>>> B.
>>> 
>>> On 25 October 2012 01:29, Alexander Bolodurin
>>> <alexander.bolodurin@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> (I have asked this at StackOverflow, but, unsurprisingly, the question didn't
get much attention.)
>>>> 
>>>> I'm designing replication conflict handling for a system, and one of its
assumptions is that deletion always takes precedence when resolving conflicts: a deleted documents
stays deleted regardless of what edits it conflicts with, IDs are not reused.
>>>> 
>>>> The "official" way of resolving replication conflicts (read conflicting revisions,
merge in the application code, delete unwanted revisions) is not applicable to deleted documents.
If a document is edited on instance 1, and deleted on instance 2, after replication both instances
get the revision from 1. Because only one leaf revision is alive, the document ends up "undeleted",
and without conflicts. The other revision ends up in _deleted_conflicts field, instead of
_conflicts, but I can't use _deleted_conflicts as a cue that a document was deleted, because
it includes deleted revisions from resolving edit conflicts and documents that were deleted
and then re-added, so it's too general and conflates several cases.
>>>> 
>>>> How can I get around this at the CouchDB level? Moving it up the application
layer gets really hairy really quickly as now I have to have my custom "deleted" flag, rewrite
my views, test more code and have extra batch jobs to clean up records marked for delete.
>>>> 
>>>> Regards,
>>>> Alex.
>>> 
>> 
> 


Mime
View raw message