couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: What happens with a document, if a conflict is not resolved?
Date Fri, 30 Oct 2009 10:46:24 GMT

On Oct 30, 2009, at 4:33 AM, Brian Candler wrote:

> On Thu, Oct 29, 2009 at 01:51:57PM -0400, Damien Katz wrote:
>>> Is this a sensible API? You decide. I've given my opinion  
>>> previously.
>>
>>
>> This api seems weird, but it's the closest thing we can have to  
>> multi-
>> document transactions in CouchDB and be a distributed, partitioned
>> database. This is because it's pretty much impossible to support all-
>> or-nothing conflict checking transactions with partitioned database
>> without some sort of double-lock checking, which is slow and  
>> expensive.
>
> I don't want to prevent conflicts, nor do I want transactions. As  
> you say,
> introducing conflicting revisions is a fact of life in a distributed- 
> master
> system.
>
> However, I believe that CouchDB's API actively discourages people from
> writing apps which deal with conflicts properly, by (a) hiding them,  
> and (b)
> making resolve-on-read a multi-step process (e.g. readA, readB, readC,
> writeA, deleteB, deleteC) which itself is race-prone and may lead to  
> more
> conflicts and odd intermediate states (*)

This is true if the conflicts are being resolved on more than one  
node. You can't avoid this.

>
> What I would like to see is the following.
>
> 1. When you request document X, you get *all* conflicting revisions  
> in one
>   go. That is, they are treated as equal peers; none is promoted to  
> winner.
>
>   (However, the list can be sorted in a deterministic order, so you  
> could
>   get the current behaviour by just picking the first revision from  
> the
>   list)
>
> 2. When you perform this request, you get a single "context" tag
>   which identifies this particular *set* of revisions.
>
> 3. When you write back the new document, you supply the context tag,  
> and
>   this simultaneously supercedes all the other documents.  
> Effectively this
>   would be like the _rev you use today, but it would refer to the set.
>   It could actually just be an array of _revs, but the user should  
> treat
>   it as an opaque tag.
>
> 4. Views get to see the whole set of revisions too. Again, if they  
> want
>   today's behaviour they can just use docs[0] and ignore the others;  
> but
>   if they want to resolve conflicts they can too.
>
> 5. If two clients replace a document or set of conflicts with a new
>   document, and the new documents are identical, then they are not
>   treated as conflicts.
>
> When reading papers on systems like Dynamo, they all seem to have  
> properties
> (1)-(3). That is: it's treated as natural that conflicts should  
> arise; that
> these are fully exposed to the client; and the client is given the
> opportunity to resolve them in a single step.

That's a matter of opinion. It sounds more difficult form a client  
perspective to me to have to deal with conflicts on every read  
operation.

>
>> If you want an easier API for saving documents into a conflicted  
>> state
>> (something like ?conflict=ok), that would be a fairly easy patch to
>> make. But I'm not sure why users would want that for a single  
>> document.
>
> I think that ultimately the 409 behaviour could be dropped if  
> conflicts were
> handled as above, but that's not my number one concern.
>
> My concern is this:
>
> * Someone writes an application
>
> * They use the "obvious" API: i.e. simple GET and PUT for reading and
>  updating documents. They code to the 409 for avoiding conflicts. It  
> all
>  works fine and they are delighted with couchdb.
>
> * They switch to multi-master
>
> * All hell breaks lose. Users see their docs vanishing. Application  
> writer
>  finally works out how to do conflict management properly, and has to
>  rewrite the app entirely so that (for example) one GET becomes a
>  GET with ?conflicts=true, followed by multiple GETs for the  
> additional
>  versions, followed by conflict resolution followed by a POST
>  to _bulk_docs to replace the original document and conflicts.
>
> * Application writer curses couchdb, and curses the person who wrote
>  "Most applications require no special planning to take advantage of
>  distributed updates and replication".

It sounds like the dev didn't read the documentation.

>
> Yes, I know patches are welcome. The reason I'm not contributing  
> code for
> this right now is that I have higher priorities - I'm happy to keep  
> my app
> 409-tied while I work on other things. But at the back of my mind, I  
> know
> that I won't be going multi-master for a long time, if ever.

Patches are welcome, and most everything you propose could be done in  
front end that's not that involved.

>
> Regards,
>
> Brian.
>
> (*) Yes, I know that *with care* you can do the writes and deletes  
> together
> as a single _bulk_docs operation, and even bind them together using
> "all_or_nothing":true. But this is not obvious. And there are still  
> races.
> For example, I'm not sure that you can use a multi-key fetch for  
> getting all
> the conflicting revisions in one hit, so you have a series of GETs,  
> and you
> may find that the revs you're GETting have vanished by the time you  
> read
> them.


Mime
View raw message