couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: What happens with a document, if a conflict is not resolved?
Date Thu, 29 Oct 2009 17:51:57 GMT

On Oct 29, 2009, at 12:30 PM, Brian Candler wrote:

> On Thu, Oct 29, 2009 at 07:28:33AM +0100, fana wrote:
>> I read the book, Wiki and some Blogs about CouchDB,
>> but there is still a question in my mind.
>>
>> If a document is in conflict, the application has to resolve it.
>> But what, if this never happens?
>
> All the conflicting versions remain around, even through compaction.  
> However
> if you request a document by ID, by default you will get an arbitrary
> revision. The algorithm is the same across all nodes, so all nodes  
> will see
> the same. The "winning" document is also the one seen by views.
>
>> Can the document in conflict still be read and edited?
>
> Yes. Conflicts branch into a tree. When you've resolved a conflict,  
> you need
> to delete the conflicting revisions explicitly.
>
> Example:
>
>    X0
>
> User 1 fetches X0 and updates it to X1. User 2 fetches X0 and  
> updates it to
> X2. Then you get:
>
>      ,-> X1
>    X0
>      `-> X2
>
> If either user reads, they will see one of the versions (say X1).  
> They won't
> even know that there's a conflict unless they query with ? 
> conflicts=true, in
> which case they'll see the rev of X2 as well, but would need to do a  
> second
> read to get the contents of X2.
>
> If the database is compacted then the common ancestor X0 will be lost
> forever, but X1 and X2 will still remain. (Hence you can't rely on  
> doing a
> diff between X0 and X1, and another diff between X0 and X2, to merge  
> the
> changes).

If you want DVCS like full diffing, then one way is to attach a diff  
and revision metadata of each edit before PUTing on a document. When  
there is a conflict, the revision history is completely available for  
inspection, and the user can see where the conflicting edit began, etc.

>
> If a user edits X1 and saves back as X3, you will get
>
>      ,-> X1 -> X3
>    X0
>      `-> X2
>
> Now X2 and X3 are in conflict. The conflict may be resolved in  
> favour of X3;
> actually, I don't know the details of the algorithm, so it might be  
> possible
> for it to be resolved in favour or X2, which means that the changes  
> seen in
> X1 and X3 would both appear to "vanish" at that point.

The one with more edits wins, which prevents the arbitrary  
disappearance of document from normal editing.

>
> Note: if you are running on a single node, then by default,  
> conflicting
> updates are forbidden with a 409 error. But you can get them in two  
> ways: by
> making the changes on two separate nodes and replicating the nodes  
> to each
> other; or by using the _bulk_docs API with {"all_or_nothing":true}.
>
> The second case is used in the following shell script, so this may  
> be a good
> starting point for experimentation.
>
> ---- 8< -------------
> HOST=http://127.0.0.1:5984
> DB="$HOST/conflict_test"
> EP="$DB/_bulk_docs"
> curl -s "$HOST"
> curl -sX DELETE "$DB"
> curl -sX PUT "$DB"
>
> resp=$(curl -sX POST -d @- $EP <<JSON)
> {"all_or_nothing":true,"docs":[{
> "_id":"mydoc",
> "type":"test"
> }]}
> JSON
> rev0=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
> echo $rev0
>
> resp=$(curl -sX POST -d @- $EP <<JSON)
> {"all_or_nothing":true,"docs":[{
> "_id":"mydoc",
> "_rev":"$rev0",
> "type":"test",
> "data":"foo"
> }]}
> JSON
> rev1=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
> echo $rev1
>
> resp=$(curl -sX POST -d @- $EP <<JSON)
> {"all_or_nothing":true,"docs":[{
> "_id":"mydoc",
> "_rev":"$rev0",
> "type":"wibble",
> "data":"bar"
> }]}
> JSON
> rev2=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
> echo $rev2
>
> # Now we have two conflicting versions.
> echo
> echo "Getting the auto-selected version:"
> curl -s "$DB/mydoc"
> echo
> echo "Getting the auto-selected version with 'conflicts':"
> curl -s "$DB/mydoc?conflicts=true"
> echo
> echo "Getting the auto-selected version with 'revs_info':"
> curl -s "$DB/mydoc?revs_info=true"
>
> # Note that you would have to retrieve the conflicting versions  
> yourself
>
> echo "Now updating version $rev1"
> resp=$(curl -sX POST -d @- $EP <<JSON)
> {"all_or_nothing":true,"docs":[{
> "_id":"mydoc",
> "_rev":"$rev1",
> "type":"test",
> "data":"baz"
> }]}
> JSON
> rev3=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
> echo $rev3
>
> echo
> echo "Getting the auto-selected version:"
> curl -s "$DB/mydoc"
> echo
> echo "Getting the auto-selected version with 'conflicts':"
> curl -s "$DB/mydoc?conflicts=true"
> ---- 8< -------------
>
> Is this a sensible API? You decide. I've given my opinion previously.


This api seems weird, but it's the closest thing we can have to multi- 
document transactions in CouchDB and be a distributed, partitioned  
database. This is because it's pretty much impossible to support all- 
or-nothing conflict checking transactions with partitioned database  
without some sort of double-lock checking, which is slow and  
expensive. And also replication doesn't replicate transactions, only  
documents, so we don't wish to confuse users by introducing  
transactions that aren't supported by the rest of CouchDB.

If you want an easier API for saving documents into a conflicted state  
(something like ?conflict=ok), that would be a fairly easy patch to  
make. But I'm not sure why users would want that for a single document.

Thanks for this write up, you seem to have given a good high  
description how conflicts work in CouchDB.

-Damien


>
> HTH,
>
> Brian.


Mime
View raw message