On Oct 29, 2009, at 12:30 PM, Brian Candler wrote:
> On Thu, Oct 29, 2009 at 07:28:33AM +0100, fana wrote:
>> I read the book, Wiki and some Blogs about CouchDB,
>> but there is still a question in my mind.
>>
>> If a document is in conflict, the application has to resolve it.
>> But what, if this never happens?
>
> All the conflicting versions remain around, even through compaction.
> However
> if you request a document by ID, by default you will get an arbitrary
> revision. The algorithm is the same across all nodes, so all nodes
> will see
> the same. The "winning" document is also the one seen by views.
>
>> Can the document in conflict still be read and edited?
>
> Yes. Conflicts branch into a tree. When you've resolved a conflict,
> you need
> to delete the conflicting revisions explicitly.
>
> Example:
>
> X0
>
> User 1 fetches X0 and updates it to X1. User 2 fetches X0 and
> updates it to
> X2. Then you get:
>
> ,-> X1
> X0
> `-> X2
>
> If either user reads, they will see one of the versions (say X1).
> They won't
> even know that there's a conflict unless they query with ?
> conflicts=true, in
> which case they'll see the rev of X2 as well, but would need to do a
> second
> read to get the contents of X2.
>
> If the database is compacted then the common ancestor X0 will be lost
> forever, but X1 and X2 will still remain. (Hence you can't rely on
> doing a
> diff between X0 and X1, and another diff between X0 and X2, to merge
> the
> changes).
If you want DVCS like full diffing, then one way is to attach a diff
and revision metadata of each edit before PUTing on a document. When
there is a conflict, the revision history is completely available for
inspection, and the user can see where the conflicting edit began, etc.
>
> If a user edits X1 and saves back as X3, you will get
>
> ,-> X1 -> X3
> X0
> `-> X2
>
> Now X2 and X3 are in conflict. The conflict may be resolved in
> favour of X3;
> actually, I don't know the details of the algorithm, so it might be
> possible
> for it to be resolved in favour or X2, which means that the changes
> seen in
> X1 and X3 would both appear to "vanish" at that point.
The one with more edits wins, which prevents the arbitrary
disappearance of document from normal editing.
>
> Note: if you are running on a single node, then by default,
> conflicting
> updates are forbidden with a 409 error. But you can get them in two
> ways: by
> making the changes on two separate nodes and replicating the nodes
> to each
> other; or by using the _bulk_docs API with {"all_or_nothing":true}.
>
> The second case is used in the following shell script, so this may
> be a good
> starting point for experimentation.
>
> ---- 8< -------------
> HOST=http://127.0.0.1:5984
> DB="$HOST/conflict_test"
> EP="$DB/_bulk_docs"
> curl -s "$HOST"
> curl -sX DELETE "$DB"
> curl -sX PUT "$DB"
>
> resp=$(curl -sX POST -d @- $EP <<JSON)
> {"all_or_nothing":true,"docs":[{
> "_id":"mydoc",
> "type":"test"
> }]}
> JSON
> rev0=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
> echo $rev0
>
> resp=$(curl -sX POST -d @- $EP <<JSON)
> {"all_or_nothing":true,"docs":[{
> "_id":"mydoc",
> "_rev":"$rev0",
> "type":"test",
> "data":"foo"
> }]}
> JSON
> rev1=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
> echo $rev1
>
> resp=$(curl -sX POST -d @- $EP <<JSON)
> {"all_or_nothing":true,"docs":[{
> "_id":"mydoc",
> "_rev":"$rev0",
> "type":"wibble",
> "data":"bar"
> }]}
> JSON
> rev2=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
> echo $rev2
>
> # Now we have two conflicting versions.
> echo
> echo "Getting the auto-selected version:"
> curl -s "$DB/mydoc"
> echo
> echo "Getting the auto-selected version with 'conflicts':"
> curl -s "$DB/mydoc?conflicts=true"
> echo
> echo "Getting the auto-selected version with 'revs_info':"
> curl -s "$DB/mydoc?revs_info=true"
>
> # Note that you would have to retrieve the conflicting versions
> yourself
>
> echo "Now updating version $rev1"
> resp=$(curl -sX POST -d @- $EP <<JSON)
> {"all_or_nothing":true,"docs":[{
> "_id":"mydoc",
> "_rev":"$rev1",
> "type":"test",
> "data":"baz"
> }]}
> JSON
> rev3=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
> echo $rev3
>
> echo
> echo "Getting the auto-selected version:"
> curl -s "$DB/mydoc"
> echo
> echo "Getting the auto-selected version with 'conflicts':"
> curl -s "$DB/mydoc?conflicts=true"
> ---- 8< -------------
>
> Is this a sensible API? You decide. I've given my opinion previously.
This api seems weird, but it's the closest thing we can have to multi-
document transactions in CouchDB and be a distributed, partitioned
database. This is because it's pretty much impossible to support all-
or-nothing conflict checking transactions with partitioned database
without some sort of double-lock checking, which is slow and
expensive. And also replication doesn't replicate transactions, only
documents, so we don't wish to confuse users by introducing
transactions that aren't supported by the rest of CouchDB.
If you want an easier API for saving documents into a conflicted state
(something like ?conflict=ok), that would be a fairly easy patch to
make. But I'm not sure why users would want that for a single document.
Thanks for this write up, you seem to have given a good high
description how conflicts work in CouchDB.
-Damien
>
> HTH,
>
> Brian.
|